OpenAI-compatible specialized models

Your agent does not need a frontier model for every step.

Q: Do I need to label my own training data?

No. You describe the task in plain English and Forjal generates the training examples automatically. Your job is to review a small batch: approve what looks right, correct what does not, and add edge cases when needed. Those corrections shape the dataset before training starts.

Q: Do I need to change my existing code to integrate?

Just one line. Forjal exposes an OpenAI-compatible endpoint, so if you're already using the OpenAI SDK you only need to change the base URL. Everything else — the request format, response structure, streaming — stays exactly the same.

Q: Will my data be used to train other customers' models?

No. Your training examples, your inputs, and your model are isolated to your account. Nothing you send to Forjal is used to improve models for other customers.

Q: Can I test the model before sending it to production?

Yes. Forjal compares the trained model against the base model, your approved examples, held-out cases, and your current frontier-model prompt when possible. You can test real inputs before pointing production traffic at it.

Forjal takes a bounded task your agent already runs, generates examples, trains and evaluates a smaller model, then ships it behind an OpenAI-compatible API.

The production gap

AIagentsarereachingproductionwithprototypearchitecture.

In prototypes, it is fine to call a frontier model for every step. In production, that architecture gets expensive, slow, and harder to control.

NVIDIA & Georgia Tech

Research on agentic AI, 2025

“40-70% of LLM calls in agent pipelines can already be replaced by SLMs today”

Read the paper

Onceyoufindtherepeatedsteps,theeconomicschange.

The research points to where teams should look. Forjal turns those candidate calls into trained, evaluated models your product can use through the same API pattern.

30–60x

lower cost on repeated, bounded steps

0 ML ops

to manage training, evaluation, or serving

1 API URL

to swap into your current OpenAI integration

How it works

From task description to specialized model API.

Forjal is a knowledge conversion platform. You describe what a model should do, review generated behavior, and ship a specialized model — without a training stack, without ML ops, without changing your code.

Describe the task

Tell Forjal what your model should do — in plain language.

Describe the task your agent already runs: the input it receives, the output it produces, the rules it follows, and the edge cases it should refuse to guess on. Your team's operational knowledge becomes the spec.

You describe intent, not implementation.

Production call

Analyze technical support tickets

Generate examples

Forjal creates synthetic training data from your spec.

Instead of building a dataset from scratch, Forjal generates training examples automatically. Your team reviews model behavior instead of labeling data — starting from real patterns, not blank spreadsheets.

Review behavior, not build data.

Auth error

Enterprise outage

Billing dispute

Review and correct

Your corrections shape the dataset before training.

Approve, correct, or reject generated examples. Each correction becomes the strongest signal in the training set. A few precise edits from someone who knows the task outperform thousands of generic examples.

A few corrections beat thousands of generic examples.

Input

“Enterprise customer cannot authenticate after SSO migration.”

Evaluate against baselines

See quality, cost, and latency before any code change.

Compare the trained model against the base model, held-out test cases, and your current frontier-model prompt. See every metric side by side before you change a single line of production code.

Evidence before replacement, not faith.

Quality check

baselinefrontier prompt

holdoutpassed

costlower

Deploy via API

Change one URL. Everything else stays the same.

Your specialized model runs behind an OpenAI-compatible endpoint. Change the base URL in your existing SDK call. The request format, response structure, streaming — all stay exactly the same.

One URL change. Zero code migration.

Live model

modelticket-triage-v1

latencylow

Forjal model training progress on a laptop

Review behavior. Ship the model.

Forjal turns approved examples into a trained, evaluated endpoint your agent can call in production, without adding a training stack to your team.

Where Forjal fits

Choosethestepsthatarerepeated,bounded,andmeasurable.

Forjal is strongest when the task has clear inputs, business rules, quality criteria, and a defined output your application can consume.

Evaluation before deploy

Do not replace a frontier call on faith.

Forjal gives you a comparison report before deployment, so you can see whether the specialized model is accurate, cheaper, and predictable enough for that step.

Evaluation report

ticket-triage-v1

Base model

What the selected open model can do before specialization.

Trained model

Performance after your reviewed examples shape the dataset.

Current prompt

A comparison against the prompt you run on a frontier model today.

Production fit

Latency, cost, edge cases, and failure modes before rollout.

FAQ

Questions before switching a production call.

The practical details behind replacing a large-model prompt with a model trained for one job.

How is this different from just writing a better system prompt?

A system prompt tells a general model how to behave. It works, but the model was never trained for your task, so you need to be precise, verbose, and defensive. A Forjal model is trained on examples of the exact step you want to run, which means shorter calls, lower cost, and more consistent behavior at the edges.

Do I need to label my own training data?

Do I need to change my existing code to integrate?

Will my data be used to train other customers' models?

Can I test the model before sending it to production?

Early access

Create a specialized model for one repeated agent step.

Describe the task, review generated examples, compare the result, and deploy through an API your product already understands.