OpenAI-compatible specialized models

Your agent does not need a frontier model for every step.

Forjal takes a bounded task your agent already runs, generates examples, trains and evaluates a smaller model, then ships it behind an OpenAI-compatible API.

The production gap

AIagentsarereachingproductionwithprototypearchitecture.

In prototypes, it is fine to call a frontier model for every step. In production, that architecture gets expensive, slow, and harder to control.

Agents create repeated model calls.

A single workflow can call models for routing, validation, formatting, scoring, summarizing, tool selection, and next-step decisions.

Most steps are bounded.

They are not open-ended reasoning problems. They have known inputs, rules, expected outputs, and measurable quality.

Frontier models should plan, not execute everything.

Use large models for ambiguity and strategy. Use smaller specialized models for repeated steps where cost, latency, and control matter.

NVIDIA

NVIDIA & Georgia Tech

Research on agentic AI, 2025

“40-70% of LLM calls in agent pipelines can already be replaced by SLMs today”

Read the paper

Onceyoufindtherepeatedsteps,theeconomicschange.

The research points to where teams should look. Forjal turns those candidate calls into trained, evaluated models your product can use through the same API pattern.

30–60x

lower cost on repeated, bounded steps

0 ML ops

to manage training, evaluation, or serving

1 API URL

to swap into your current OpenAI integration

How it works

From task description to specialized model API.

Forjal is a knowledge conversion platform. You describe what a model should do, review generated behavior, and ship a specialized model — without a training stack, without ML ops, without changing your code.

Describe the task

Tell Forjal what your model should do — in plain language.

Describe the task your agent already runs: the input it receives, the output it produces, the rules it follows, and the edge cases it should refuse to guess on. Your team's operational knowledge becomes the spec.

You describe intent, not implementation.
Production call
Analyze technical support tickets

Generate examples

Forjal creates synthetic training data from your spec.

Instead of building a dataset from scratch, Forjal generates training examples automatically. Your team reviews model behavior instead of labeling data — starting from real patterns, not blank spreadsheets.

Review behavior, not build data.
Auth error
Enterprise outage
Billing dispute

Review and correct

Your corrections shape the dataset before training.

Approve, correct, or reject generated examples. Each correction becomes the strongest signal in the training set. A few precise edits from someone who knows the task outperform thousands of generic examples.

A few corrections beat thousands of generic examples.
Input

“Enterprise customer cannot authenticate after SSO migration.”

Evaluate against baselines

See quality, cost, and latency before any code change.

Compare the trained model against the base model, held-out test cases, and your current frontier-model prompt. See every metric side by side before you change a single line of production code.

Evidence before replacement, not faith.
Quality check
baselinefrontier prompt
holdoutpassed
costlower

Deploy via API

Change one URL. Everything else stays the same.

Your specialized model runs behind an OpenAI-compatible endpoint. Change the base URL in your existing SDK call. The request format, response structure, streaming — all stay exactly the same.

One URL change. Zero code migration.
Live model
modelticket-triage-v1
latencylow
Forjal model training progress on a laptop

Review behavior. Ship the model.

Forjal turns approved examples into a trained, evaluated endpoint your agent can call in production, without adding a training stack to your team.

Where Forjal fits

Choosethestepsthatarerepeated,bounded,andmeasurable.

Forjal is strongest when the task has clear inputs, business rules, quality criteria, and a defined output your application can consume.

Technical ticket analysis

Classify urgency, infer likely cause, and suggest the next support action from product-specific context.

Log interpretation

Turn recurring error traces into probable causes, affected systems, and structured diagnostic output.

Agent tool routing

Choose the right internal tool or API call when the decision follows known rules and formats.

Internal rule validation

Check whether responses, decisions, or actions follow company policy before they reach users.

Product documentation answers

Answer narrow product questions with behavior tuned around your SDK, API, or internal docs.

Support conversation analysis

Extract intent, risk, sentiment, and next-step recommendations from customer conversations.

Evaluation before deploy

Do not replace a frontier call on faith.

Forjal gives you a comparison report before deployment, so you can see whether the specialized model is accurate, cheaper, and predictable enough for that step.

Evaluation report

ticket-triage-v1

01

Base model

What the selected open model can do before specialization.

02

Trained model

Performance after your reviewed examples shape the dataset.

03

Current prompt

A comparison against the prompt you run on a frontier model today.

04

Production fit

Latency, cost, edge cases, and failure modes before rollout.

FAQ

Questions before switching a production call.

The practical details behind replacing a large-model prompt with a model trained for one job.

A system prompt tells a general model how to behave. It works, but the model was never trained for your task, so you need to be precise, verbose, and defensive. A Forjal model is trained on examples of the exact step you want to run, which means shorter calls, lower cost, and more consistent behavior at the edges.

Early access

Create a specialized model for one repeated agent step.

Describe the task, review generated examples, compare the result, and deploy through an API your product already understands.

How many LLM calls do you make per month?