Two stacks, two policies
Internal agents go through guardrails, audit, eval gates. Customer-facing apps hit a raw provider API. The compliance review only covers one of them.
Ready to get started?
Deploy sovereign AI on your infrastructure - in weeks, not months.
Developers · Public API · 39 endpoints · OpenAI-compatible
Fourteen endpoints behind one prefix. Chat, search, evaluate, generate, run governed tools. Same RBAC, audit trail, and guardrails as the UI.
OpenAI-compatible drop-in. SSE streaming. Per-key rate limits. Audit logged.
39 endpoints Streaming · SSE OpenAI-compatible Per-key rate limits
Most enterprises end up with two AI stacks: one for internal users (chat, agents, RAG) and one for customer-facing apps (developer SDKs, raw model APIs). Different policies. Different audit. Different bills. The customer-facing one usually has the weakest controls because it was built fastest.
Internal agents go through guardrails, audit, eval gates. Customer-facing apps hit a raw provider API. The compliance review only covers one of them.
Internal AI billing rolls up neatly. The customer-facing API bill comes in separately, with different unit economics and different blast radius. Engineering finds out at month-end.
When something goes wrong, you have to correlate logs across two systems. The customer who saw a bad response and the internal user who saw a similar one are in different stores. Investigation costs hours.
Thirty-nine endpoints across nine capability groups: chat, agents, sessions, knowledge, evaluations, prompt optimisation, data flywheel, tools and models, analytics. Each one runs through the same services that power the UI - so policy, permissions, and audit behave identically whether the call came from a browser or a build pipeline.
01 · Agents
Talk to them, list them, manage them, promote them.
02 · Knowledge + LLM
Permission-aware retrieval and multi-model completions.
03 · Evaluation
CI/CD quality gates without logging into a UI.
04 · Documents + Tools
Generative work, delivered as a file.
Every endpoint sits under /public/v1. Versioned. Documented in OpenAPI 3.1. Imports as a typed client into Python, Node, Go, and Java.
A real chat call to a published agent. Authentication with a key. Permission inheritance from the user_email header. Streaming Server-Sent Events as the agent thinks, retrieves, and responds. Final payload includes citations, retrieval log, and trace ID for the audit row.
1 · REQUEST
$ curl -N https://api.your-org.katonic.ai/public/v1/chat \ -H "Authorization: Bearer kat_live_..." \ -H "X-User-Email: sarah@your-org" \ -H "Content-Type: application/json" \ -d '{ "agent_id": "hr-assistant", "messages": [{ "role": "user", "content": "What's the remote work policy?" }], "stream": true }'
WHAT THIS DOES
· Authenticates with a published API key
· Carries Sarah's identity for permission inheritance
· Streams the response via SSE (stream: true)
2 · STREAMING · SSE
agent.start
trace_id: 7c3f...
tool.call
knowledge_search · 'remote work policy'
tool.result
3 results · permissions filtered for Sarah
guardrails.check
input ✓ · 8/8 rails passed
completion.delta
"Our current remote work policy is"
completion.delta
" documented in HR-2024-08..."▌
completion.delta
" Up to 3 days per week with..."
guardrails.check
output ✓ · 8/8 rails passed
agent.complete
tokens: 247 · cost: $0.0042
3 · FINAL · agent.complete
{ "trace_id": "7c3f9a2b...", "answer": "Our current...", "citations": [ { "doc_id": "HR-2024-08", "section": "2.1", "page": 3 } ], "usage": { "tokens": 247, "cost_usd": 0.0042, "latency_ms": 2247 }, "audit": { "rbac_check": "pass", "guardrails": "8/8 pass", "logged": true }}SAME GUARANTEES AS THE UI
· trace_id links to the audit log row
· citations only from docs Sarah can read
· guardrails fired before output left platform
Three things every call gets, automatically
Every grounded claim points back to a document, page, and section. No hallucinations to defend in a customer escalation.
Every call gets a trace ID linking the request to the audit log, the policy decisions, the retrieval log, and the cost meter.
SSE by default for chat. Tool calls, retrieval steps, and completion deltas come through as separate events your UI can render progressively.
Existing app on the OpenAI SDK? Point base_url at /public/v1, swap the API key, and the same code now routes through Katonic. Multi-provider routing, BYOK, guardrails, audit, and tier picking come along - no rewrite required.
- BEFORE · DIRECT TO OPENAI
from openai import OpenAI client = OpenAI( api_key="sk-...") resp = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": "Hello" }] )
WHAT YOU LOSE
· No multi-provider routing
· No guardrails on input/output
· No audit log · no permission check
· No team budget enforcement
+ AFTER · KATONIC PUBLIC API
from openai import OpenAI client = OpenAI( api_key="kapi_live_...", base_url="https://api.your-org .katonic.ai/public/v1") resp = client.chat.completions.create( model="balanced", # tier, not model messages=[{ "role": "user", "content": "Hello" }] )
WHAT YOU GET
· Multi-provider routing (140+ providers)
· Guardrails on input + output (8 rails)
· Audit log · trace ID · permission check
· Team budgets · BYOK enforcement
You can pass a tier name like "balanced" or a specific model like "claude-sonnet-4". Tiers route to whichever model is currently the best fit per your config; specific model names give you direct control. Both work.
Agent definitions live in your git repo. Pull request changes trigger an eval run. Merge to main promotes to test. Tag a release to promote to prod. The same CI/CD discipline you use for your product, applied to AI - including the eval gates that block bad versions from shipping.
name: Agent CI on: pull_request: { paths: ['agents/**'] } push: { branches: [main], tags: ['v*'] } jobs: evaluate: runs-on: ubuntu-latest steps: # Validate the agent config - run: katonic agents validate ./agents/hr.yml # Run all 7 evaluators against the test suite - run: | curl -X POST $KAT_URL/public/v1/evaluations/run \ -H "Authorization: Bearer $KAT_KEY" \ -d '{"agent": "hr-assistant", "suite": "hr-eval-v3"}' # Compare against the last known-good baseline - run: katonic eval compare $RUN_ID --baseline=prod promote: needs: evaluate if: github.ref == 'refs/tags/v*' steps: - run: katonic agents promote hr-assistant --env=prod # Eval gate enforced server-side · blocks if scores fail
validate
Agent config valid
evaluations/run
127 cases · 7 evaluators · 98% pass
evaluations/compare
vs prod baseline · +0.04 ↑ improvement
promote
Eval gate ✓ · promoted to prod
Eval gates run on the platform, not in your CI. Even if a developer skips the eval step in their workflow, the promote endpoint still calls the gate. Bad versions cannot reach production by being clever with YAML.
Manage keys at /studio/public-api. Browse the 39 endpoints with live curl examples. Copy a working snippet for Python, TypeScript, or curl. The chrome below is what your developers see on day one.
Developer Hub
39 endpoints · 9 groups · OpenAPI 3.1 · OpenAI-compatible at /public/v1
Active keys
5
1 rotating
Calls today
284,128
across all keys
Errors / 1k
0.4
0.04% error rate
P95 latency
412ms
across all endpoints
kapi_live_b71d8f4a92c5d05e6f7a8b9c0d1e2f3a4b5c6d7e8f9| Name | Key prefix | Scopes | Team | Rate (used) | Last used | Status |
|---|---|---|---|---|---|---|
| Customer chat widget · prod | kapi_live_8a3f...e92 | chatsearchcompletions | Customer Success | 120/min 84% | 2m ago | ● ACTIVE |
| Mobile app backend | kapi_live_2f1c...c47 | chatcompletionsmodels | Mobile team | 60/min 23% | 12m ago | ● ACTIVE |
| GitHub Actions · CI | kapi_live_9d4b...b83 | evaluationsagents/* | Engineering | 10/min 0% | 3h ago | ● ACTIVE |
| Internal Slack bot | kapi_live_6c2a...a18 | chatsearchtools | Engineering | 30/min 45% | 1m ago | ● ACTIVE |
| Mobile app backend (rotating) | kapi_live_b71d...d05 | chatcompletionsmodels | Mobile team | 60/min 0% | never | ○ ROTATING |
What every key gives you, by construction
Each key only calls endpoints you grant. The CI key can promote agents but cannot chat. The chat widget can chat but cannot promote.
Calls per minute, per hour, per day. Auto-throttle when exceeded. The chat widget can't accidentally DOS the platform.
Issue a new key, deploy it, revoke the old one. The platform accepts both during the window. Audit logs stay continuous.
How fast can I integrate?
Five minutes for a hello world. Bring your existing OpenAI client, point base_url at /public/v1, get back streaming responses with citations and audit trace IDs. No new SDK to learn. Typed clients available for Python, Node, Go, Java if you want them.
Can I render the streaming UX?
Yes. Every chat call streams Server-Sent Events with typed event names: agent.start, tool.call, completion.delta, agent.complete. Render thinking states, tool calls, and citations as they arrive. The reference UI in the platform Workroom uses this exact same stream.
What about rate limits and key management?
Per-key limits at minute, hour, day granularity. Scope each key to specific endpoints. Bind keys to teams for budget attribution. Rotate without downtime - the platform accepts both old and new keys during your migration window.
Same controls as the UI?
Yes. Every endpoint runs through the same RBAC, guardrails, audit, and policy layers as the UI. The only difference is what's calling - browser session or API key. Audit log includes the key ID so you know exactly which integration triggered each event.
Fast to start. Painful to govern.
Apps call OpenAI / Anthropic / Google directly. No platform features. Each integration reinvents permissions, audit, retries, rate limits. Compliance review finds you the moment you scale.
Months of platform engineering.
Build the OpenAI-compatible router. Build BYOK key management. Build streaming. Build audit. Build rate limits. Maintain it forever. Half a year before a single feature ships.
39 endpoints. Same governance. Day one.
Every UI capability has a URL. Drop-in OpenAI compatibility. SSE streaming with typed events. Per-key scopes, rate limits, and team budgets. Audit trace IDs link to the same logs as the UI.
Most enterprises end up with two AI stacks. Internal users get the curated version with audit and guardrails. Customer-facing apps get raw API calls because that ships in two weeks. The compliance review only sees one of them. The customer escalation comes from the other. The platform's job is to make sure that does not happen - one set of services, two surfaces, same controls.
Sandbox access in 24 hours. Comes pre-loaded with a sample agent, an API key with chat scope, and a cURL command in the docs that you paste into your terminal to make your first call.
Then point your existing OpenAI client at /public/v1 and you've migrated.
