GPU procurement is the gate
Get the GPUs first. Then schedule them. Then make sure prod doesn't get starved when training kicks off. Most teams stop right here.
Ready to get started?
Deploy sovereign AI on your infrastructure - in weeks, not months.
Developers · Fine-tuning · NeMo + OpenPipe ART
Supervised via NVIDIA NeMo and reinforcement learning via OpenPipe ART - in one workspace, on your GPUs.
Both deploy straight to the AI Gateway. Zero data egress.
Fine-tuning sounds simple in a paper. In practice it means provisioning GPUs, picking a framework, wiring up a training loop, hosting the artifact, deploying behind a router that handles tokens and rate limits. Then doing the whole thing again when you want to try a different approach. By the time the first model is live, the team has burned weeks before they've evaluated whether the fine-tune actually helps.
Get the GPUs first. Then schedule them. Then make sure prod doesn't get starved when training kicks off. Most teams stop right here.
SFT teaches knowledge; RL shapes behaviour. They use different tools, different formats, different deployment paths. Most platforms pick one and tax you for the other.
After the artifact lands, you still need a serving runtime, a routing layer, a billing dimension, and a way for agents to actually call the new model. Months of plumbing.
This is what your team sees at /studio/fine-tuning. Two tabs - Supervised (NeMo) and RL (OpenPipe) - with the same shape. Hyperparameter form on top, job list with live status below. No notebook, no DevOps ticket, no GPU procurement form.
Fine-tuning
Train custom models on your data. Two paths: Supervised (NeMo) for knowledge, RL (OpenPipe) for behaviour.
Dataset Path / URI
Base Model
Learning Rate
Epochs
Batch Size
| Job ID | Model | Status | Created | Actions |
|---|---|---|---|---|
| ft_nemo_a3f8c2 | llama-3.1-8b | ● deployed | Mar 14, 2024 | |
| ft_nemo_b9d4e1 | llama-3.1-8b | ● completed | Mar 18, 2024 | |
| ft_nemo_c2e7a9 | qwen-2.5-7b | ● training67% | Mar 22, 2024 | |
| ft_nemo_d8f3b6 | llama-3.1-8b | ● queued | Mar 22, 2024 | |
| ft_nemo_e1a5c7 | mistral-7b-v3 | ● failed | Mar 20, 2024 |
/studio/fine-tuning renders this in your sandbox today, including the two-tab structure, the live job table with status colors, and the deploy-to-Gateway button.Most teams default to whichever method they read about first. The picks should follow the problem. SFT teaches a model what the answer looks like. RL teaches a model how to behave when there's no single right answer - just better and worse outcomes measured by reward.
When you have labeled data.
You have input/output pairs. Customer support ticket → ideal response. Document → summary in your house style. Medical chart → SOAP note format.
Best for
When you have outcomes, not answers.
You have multi-turn agent trajectories with reward signals. Sales conversations that closed vs didn't. Tool-call sequences that resolved vs failed. Long horizons where the right next step depends on what came before.
Best for
The fine-tuning loop runs end to end inside the platform. Data preparation, training, deployment, and evaluation all land in the same workspace. No exporting weights to a notebook, no separate hosting layer, no cron job watching for completion.
JSONL for SFT (input/output pairs) or trajectory JSON for RL (turns + reward). Datasets versioned in the platform's data lake. Quality gates check for size, balance, leakage.
Choose from the registered open-weight models in your AI Gateway. Default hyperparams cover 80% of cases - learning rate, epochs, batch size. Override for advanced tuning.
Job lands on the KAI scheduler with your team's GPU quota. Spot priority for non-critical, guaranteed for production fine-tunes. No external GPU procurement needed.
Live status updates: queued → running → training → completed. Loss curves and validation metrics surface in real time. Failed jobs include log access and root-cause hints.
One click registers the fine-tuned model with the AI Gateway. It becomes selectable from the model dropdown in any agent config. Same auth, same routing, same audit.
Run the model against your eval datasets before promoting. Eval gates can block promotion if scores drop. Rollback re-points the active version - one click.
Can I bring my own framework?
Yes. The fine-tuning page exposes NeMo and OpenPipe ART as default paths because most teams want one of those two. If you have a custom training script, you can run it as a job in a Code Builder workspace with your team's GPU quota. The deploy-to-Gateway step works the same way.
How do I version datasets?
Datasets are first-class objects with versions. Upload a JSONL or trajectory file, the platform versions it, and every fine-tuning job records the dataset hash it trained on. Reproducible runs by default. The dataset object also tracks who can read/write it via the same RBAC as agents and knowledge.
What about GPU isolation?
Jobs run under the KAI scheduler with team-level quotas and per-job priority. Production agents have guaranteed GPU; fine-tuning lands on spot priority by default but can be promoted. Failed or cancelled jobs free their GPUs immediately. No leaked tenants.
Where does the data go?
Nowhere. The dataset, the training job, the resulting weights, and the deployment all stay inside your infrastructure. NeMo and OpenPipe run in your VPC. The Gateway routes to local model servers. Zero data egress is architectural, not a setting.
Free-form. Slow. Hard to share.
Spin up a Colab or SageMaker notebook, write the training loop, host it manually, wire up a router. Each fine-tune is a one-off project. Re-running it next quarter means remembering the exact recipe.
Easy to start. Locked in.
Send your training data to the model vendor's hosted fine-tuning API. They train, they host, they price. Your data leaves your infrastructure. The fine-tune is opaque, non-portable, and tied to one model family.
Two paths. One workspace. Your GPUs.
Supervised via NeMo or RL via OpenPipe ART. Both run on your infrastructure. Both deploy to the AI Gateway with one click. Datasets versioned, jobs tracked, weights yours to export.
Most platforms force a choice: ship your data to a vendor and lose control, or build the whole fine-tuning stack yourself and lose six months. We picked: integrate the two best open-source fine-tuning frameworks - NeMo for supervised, OpenPipe ART for RL - and run them on the GPUs you already have. The data never moves. The weights are yours. The deployment is one click. Fine-tuning should be a feature of the platform, not a separate project.
Sandbox access in 24 hours. Comes with a sample dataset (1,000 customer support tickets), GPU quota for one fine-tune, and a pre-configured base model. From dataset upload to deployed model in an afternoon.
Then bring your own data and run for real.
