ENTERPRISE · SOVEREIGN · ON YOUR INFRASTRUCTUREDocsTrust centerAboutContact sales →

← Platform§ AI Gateway · One front door for every model

Platform · AI Gateway · Multi-provider routing · BYOK

Every model.
One front door.

One endpoint to call any of 2,600 models across 140 providers. Bring your own keys to OpenAI, Anthropic, Google, or run open-source on your own hardware. Agents ask for a quality tier, the gateway picks the model. Switch providers without touching an agent. Cap budgets. Route by region. Fail over automatically.

No lock-in. No hidden margin. No re-coding when the best model changes.

Try the gateway →See customer stories

2,600+ models 140+ providers Bring your own keys 9 regions · in-region routing

§ 01 · The problemPicking one provider is the new lock-in

The best model
changes every quarter.

Pick one model provider, hard-code their SDK into your agents, and you've shipped lock-in. Six months later a competitor leapfrogs them on quality, halves the price, or releases the open-weights version. Now you're rewriting agents to switch. Or staying on the loser because switching costs too much.

PROBLEM 01RE-CODING TAX

Switching costs are the lock-in

Every agent that hard-codes a model SDK becomes one more thing to rewrite when you want to switch. The cost compounds with every agent you ship.

PROBLEM 02AVAILABILITY RISK

One provider, one bad day

API down? Rate-limited? Quota exhausted? Every agent in your business stops. No failover. Production AI on a single point of failure.

PROBLEM 03BUDGET BLINDNESS

Cost is invisible until the bill arrives

A handful of expensive agents on the highest-tier model can blow the monthly budget. By the time finance sees the invoice, the spend has already happened.

§ 02 · How a request gets routedOne endpoint · Six checks · Right model

Agents ask for a tier.
The gateway picks the model.

Your agents don't reference specific model names. They ask for what they need - "balanced reasoning", "fast classification", "code generation". The gateway resolves that to a real provider and model based on six runtime checks. Swap the underlying model centrally; every agent gets the new behaviour.

From request to model · annotated

§ 03 · Eight tiersAsk for the use case · Get the right model

Eight tiers.
One quality knob each.

Every agent ships referencing one of eight tiers, not a specific model. When the best balanced model becomes Claude Opus instead of GPT-4o, you change one mapping. Every agent in your business gets the upgrade without redeployment.

Hot-swap in production

When Claude Sonnet 5 ships and you want to make it the new "balanced," you update one mapping in the gateway. Every agent in your business gets the new model on its next call. No redeployment. No agent code change.

§ 04 · EconomicsSelf-hosted GPU · API providers · BYOK

Self-host where you should.
Use APIs where it pays.

Self-hosted open-weights on your own GPUs gives the lowest unit cost at scale and keeps inference data on your infrastructure. API providers give the absolute frontier on demand without capex. The gateway lets you split workloads by tier, by team, by data sensitivity, and by cost target.

SELF-HOSTEDYOUR GPUs · NVIDIA NIM

Open-weights on your hardware

For predictable, high-volume, sensitive workloads.

Llama, Qwen, Mistral, DeepSeek and others, served with NVIDIA NIM containers. Pre-optimised for the GPU you have. Inference data never leaves your infrastructure. Unit cost amortises over the lifetime of the GPU.

+Lowest unit cost at high utilization
+Data residency by construction
+No quota limits
+Predictable monthly spend

API PROVIDERS · BYOKOPENAI · ANTHROPIC · GOOGLE

Frontier models, your keys

For peak quality, spiky volume, low capex.

Bring your own contract with OpenAI, Anthropic, Google, Cohere, or any of 140+ providers. The gateway uses your keys; you keep your billing relationship; we never markup. Mix and match per-tier - your reasoning calls go to Claude Opus while balanced calls go to GPT-4o.

+No middle-man pricing markup
+Direct billing relationship
+Mix providers per tier or per team
+Failover across providers

§ 05 · Inside the platformThe actual Gateway page

Five tabs.
Every lever procurement asks for.

This is what your platform team sees at /gateway. Configure section: Providers, Models, Tiers. Observe section: Usage, Costs. Same chrome, same data, every org in your platform.

ConfigureObserve

app.your-org.katonic.ai/gatewayControl Room

AI Gateway

Route every model call through one governed plane. 23 providers · 140+ models · 8 tiers

Live · 4s ago

ConfigureObserve

Configured

of 23 available

With keys

1 awaiting

Avg health

99.5%

last 1h

Avg latency

346ms

p50

🔍Search providers...

OpenAIKEY

14 models · 99.4% success · 412ms

›

AnthropicKEY

8 models · 99.7% success · 624ms

›

GoogleKEY

12 models · 98.9% success · 388ms

›

AWS BedrockKEY

22 models · 99.1% success · 502ms

›

GroqKEY

6 models · 99.8% success · 89ms

›

vLLM (self-hosted)KEY

4 models · 100.0% success · 156ms

›

NVIDIA NIMKEY

18 models · 99.6% success · 198ms

›

CohereNO KEY

5 models

›

This is the actual productThe screenshots above are not concepts. /gateway renders this in your sandbox today, including provider cards with brand-aware logos, the 8 real tier types, the 2,600+ model registry, and the live cost dashboard.

§ 06 · By roleFour conversations · Four answers

The questions you'll be asked.
The answers, on hand.

CTOQ1

How do I avoid betting on one provider?

Eight quality tiers, 140-plus providers. Agents reference tiers, not models. When a competitor leapfrogs your current pick, you change one mapping. Every agent in your business gets the upgrade without redeployment.

CFOQ2

Can we control AI spend before the bill arrives?

Every model call is metered with the team that made it. Set monthly budgets per team. 80% warning, 100% hard cap, anomaly alert at 2x rolling average. Live dashboard with charts by team, tier, and day.

CISOQ3

What about data residency and provider keys?

Bring your own keys to OpenAI, Anthropic, Google. Region-aware routing keeps EU users on EU endpoints, AE on AE, and so on. Self-host open-weights on your hardware where data can't leave at all.

Platform EngQ4

What about failover and reliability?

Configure a fallback chain per tier. If your primary provider fails or rate-limits, the gateway moves the call to the next provider in the chain automatically. Auto-recovery when the primary returns.

§ 07 · vs the alternativesThree ways to call models

Three ways to talk to models.
Only one of them is portable.

✗ GAP01

Direct provider integration

One bet. One winner. Maybe yours.

Your agents call OpenAI (or Anthropic, or Google) directly. Their SDK in your code. When the leader changes - and it does, every quarter - you rewrite. When they're rate-limited, you wait.

+Vendor SDK in agent code
+No failover
+No region routing
+No team budgets
+No multi-provider economics

○ PARTIAL02

Build your own router

You're now an inference platform team.

Build the routing layer yourself. Manage provider keys, build budget enforcement, build the cost meter, build the failover, build the cache. Months of platform engineering for a non-differentiating capability.

+You build routing logic
+You build BYOK key vault
+You build cost metering
+You build failover
+Drift compounds across teams

✓ COMPLETE03

Katonic Gateway

Eight tiers, any provider, day one.

Eight quality tiers. 140-plus providers. BYOK to anyone. Budgets, region routing, failover, cache, anomaly detection, live dashboard. Configured centrally. Applied to every agent without code changes.

+8 tiers · 140+ providers · 2,600+ models
+BYOK to OpenAI, Anthropic, Google, anyone
+Budgets · region · failover · cache built-in
+Live cost dashboard with team attribution
+Hot-swap providers without agent redeployment

§ 08 · The positionDon't bet the business on one model

The model that's best today won't be the model that's best a year from today. The platform's job is to insulate the business from that change. Eight tiers, 140 providers, your keys, your budgets, your data residency. The business decides what it needs. The platform routes to whatever serves it best.
Prem Naraindas
Founder & CEO, Katonic AI
Read the founder\'s letter →

§ 09 · ExploreAdjacent surfaces

Beyond the gateway,
where it connects.

§ A→

Guardrails

Eight rail types check every input and output the gateway sees. PII, content safety, jailbreaks, grounding.

§ B→

Knowledge Engine

The retrieval layer that uses gateway embeddings and reranking tiers to power every RAG-driven agent.

§ C→

NVIDIA partnership

Self-hosted models served via NVIDIA NIM on the GPU your sovereign deployment ships with.

§ 10 · Next stepsTry the gateway · Bring your own keys

Eight tiers.
Any provider you have a contract with.

Sandbox access in 24 hours. Comes pre-configured with the eight tiers, dummy keys, the cost dashboard, and a sample agent that exercises every tier so you can see the numbers move.

Bring your own keys when you're ready. Your billing relationship stays with the provider.

Request sandbox →See customer stories

Ready to get started?

Deploy sovereign AI on your infrastructure - in weeks, not months.

Book a demo →

← Platform§ AI Gateway · One front door for every model

Platform · AI Gateway · Multi-provider routing · BYOK

Every model.
One front door.

No lock-in. No hidden margin. No re-coding when the best model changes.

Try the gateway →See customer stories

2,600+ models 140+ providers Bring your own keys 9 regions · in-region routing

§ 01 · The problemPicking one provider is the new lock-in

The best model
changes every quarter.

PROBLEM 01RE-CODING TAX

Switching costs are the lock-in

Every agent that hard-codes a model SDK becomes one more thing to rewrite when you want to switch. The cost compounds with every agent you ship.

PROBLEM 02AVAILABILITY RISK

One provider, one bad day

API down? Rate-limited? Quota exhausted? Every agent in your business stops. No failover. Production AI on a single point of failure.

PROBLEM 03BUDGET BLINDNESS

Cost is invisible until the bill arrives

A handful of expensive agents on the highest-tier model can blow the monthly budget. By the time finance sees the invoice, the spend has already happened.

§ 02 · How a request gets routedOne endpoint · Six checks · Right model

Agents ask for a tier.
The gateway picks the model.

From request to model · annotated

§ 03 · Eight tiersAsk for the use case · Get the right model

Eight tiers.
One quality knob each.

Hot-swap in production

§ 04 · EconomicsSelf-hosted GPU · API providers · BYOK

Self-host where you should.
Use APIs where it pays.

SELF-HOSTEDYOUR GPUs · NVIDIA NIM

Open-weights on your hardware

For predictable, high-volume, sensitive workloads.

+Lowest unit cost at high utilization
+Data residency by construction
+No quota limits
+Predictable monthly spend

API PROVIDERS · BYOKOPENAI · ANTHROPIC · GOOGLE

Frontier models, your keys

For peak quality, spiky volume, low capex.

+No middle-man pricing markup
+Direct billing relationship
+Mix providers per tier or per team
+Failover across providers

§ 05 · Inside the platformThe actual Gateway page

Five tabs.
Every lever procurement asks for.

This is what your platform team sees at /gateway. Configure section: Providers, Models, Tiers. Observe section: Usage, Costs. Same chrome, same data, every org in your platform.

ConfigureObserve

app.your-org.katonic.ai/gatewayControl Room

AI Gateway

Route every model call through one governed plane. 23 providers · 140+ models · 8 tiers

Live · 4s ago

ConfigureObserve

Configured

of 23 available

With keys

1 awaiting

Avg health

99.5%

last 1h

Avg latency

346ms

p50

🔍Search providers...

OpenAIKEY

14 models · 99.4% success · 412ms

›

AnthropicKEY

8 models · 99.7% success · 624ms

›

GoogleKEY

12 models · 98.9% success · 388ms

›

AWS BedrockKEY

22 models · 99.1% success · 502ms

›

GroqKEY

6 models · 99.8% success · 89ms

›

vLLM (self-hosted)KEY

4 models · 100.0% success · 156ms

›

NVIDIA NIMKEY

18 models · 99.6% success · 198ms

›

CohereNO KEY

5 models

›

§ 06 · By roleFour conversations · Four answers

The questions you'll be asked.
The answers, on hand.

CTOQ1

How do I avoid betting on one provider?

CFOQ2

Can we control AI spend before the bill arrives?

CISOQ3

What about data residency and provider keys?

Bring your own keys to OpenAI, Anthropic, Google. Region-aware routing keeps EU users on EU endpoints, AE on AE, and so on. Self-host open-weights on your hardware where data can't leave at all.

Platform EngQ4

What about failover and reliability?

Configure a fallback chain per tier. If your primary provider fails or rate-limits, the gateway moves the call to the next provider in the chain automatically. Auto-recovery when the primary returns.

§ 07 · vs the alternativesThree ways to call models

Three ways to talk to models.
Only one of them is portable.

✗ GAP01

Direct provider integration

One bet. One winner. Maybe yours.

Your agents call OpenAI (or Anthropic, or Google) directly. Their SDK in your code. When the leader changes - and it does, every quarter - you rewrite. When they're rate-limited, you wait.

+Vendor SDK in agent code
+No failover
+No region routing
+No team budgets
+No multi-provider economics

○ PARTIAL02

Build your own router

You're now an inference platform team.

+You build routing logic
+You build BYOK key vault
+You build cost metering
+You build failover
+Drift compounds across teams

✓ COMPLETE03

Katonic Gateway

Eight tiers, any provider, day one.

+8 tiers · 140+ providers · 2,600+ models
+BYOK to OpenAI, Anthropic, Google, anyone
+Budgets · region · failover · cache built-in
+Live cost dashboard with team attribution
+Hot-swap providers without agent redeployment

§ 08 · The positionDon't bet the business on one model

The model that's best today won't be the model that's best a year from today. The platform's job is to insulate the business from that change. Eight tiers, 140 providers, your keys, your budgets, your data residency. The business decides what it needs. The platform routes to whatever serves it best.
Prem Naraindas
Founder & CEO, Katonic AI
Read the founder\'s letter →

§ 09 · ExploreAdjacent surfaces

Beyond the gateway,
where it connects.

§ A→

Eight tiers.
Any provider you have a contract with.

Sandbox access in 24 hours. Comes pre-configured with the eight tiers, dummy keys, the cost dashboard, and a sample agent that exercises every tier so you can see the numbers move.

Bring your own keys when you're ready. Your billing relationship stays with the provider.

Request sandbox →See customer stories

Every model.One front door.

The best modelchanges every quarter.

Switching costs are the lock-in

One provider, one bad day

Cost is invisible until the bill arrives

Agents ask for a tier.The gateway picks the model.

Eight tiers.One quality knob each.

Self-host where you should.Use APIs where it pays.

Open-weights on your hardware

Frontier models, your keys

Five tabs.Every lever procurement asks for.

The questions you'll be asked.The answers, on hand.

Three ways to talk to models.Only one of them is portable.

Direct provider integration

Build your own router

Katonic Gateway

Beyond the gateway,where it connects.

Guardrails

Knowledge Engine

NVIDIA partnership

Eight tiers.Any provider you have a contract with.

Every model.One front door.

The best modelchanges every quarter.

Switching costs are the lock-in

One provider, one bad day

Cost is invisible until the bill arrives

Agents ask for a tier.The gateway picks the model.

Eight tiers.One quality knob each.

Self-host where you should.Use APIs where it pays.

Open-weights on your hardware

Frontier models, your keys

Five tabs.Every lever procurement asks for.

The questions you'll be asked.The answers, on hand.

Three ways to talk to models.Only one of them is portable.

Direct provider integration

Build your own router

Katonic Gateway

Beyond the gateway,where it connects.

Guardrails

Knowledge Engine

NVIDIA partnership

Eight tiers.Any provider you have a contract with.

Every model.
One front door.

The best model
changes every quarter.

Agents ask for a tier.
The gateway picks the model.

Eight tiers.
One quality knob each.

Self-host where you should.
Use APIs where it pays.

Five tabs.
Every lever procurement asks for.

The questions you'll be asked.
The answers, on hand.

Three ways to talk to models.
Only one of them is portable.

Beyond the gateway,
where it connects.

Eight tiers.
Any provider you have a contract with.

Every model.
One front door.

The best model
changes every quarter.

Agents ask for a tier.
The gateway picks the model.

Eight tiers.
One quality knob each.

Self-host where you should.
Use APIs where it pays.

Five tabs.
Every lever procurement asks for.

The questions you'll be asked.
The answers, on hand.

Three ways to talk to models.
Only one of them is portable.

Beyond the gateway,
where it connects.

Eight tiers.
Any provider you have a contract with.