Provider Integration | Add Your AI Models to OpenRouter | OpenRouter

For Providers

If you’d like to be a model provider and sell inference on OpenRouter, fill out our form to get started.

To be eligible to provide inference on OpenRouter you must have the following:

1. List Models Endpoint

You must implement an endpoint that returns all models that should be served by OpenRouter. At this endpoint, please return a list of all available models on your platform. Below is an example of the response format:

1 {
2   "data": [
3     {
4       // Required
5       "id": "anthropic/claude-sonnet-4",
6       "hugging_face_id": "", // required if the model is on Hugging Face
7       "name": "Anthropic: Claude Sonnet 4",
8       "created": 1690502400,
9       "input_modalities": ["text", "image", "file"],
10       "output_modalities": ["text"],
11       "quantization": "fp8",
12       "context_length": 1000000,
13       "max_output_length": 128000,
14       "pricing": {
15         "prompt": "0.000008", // pricing per 1 token
16         "completion": "0.000024", // pricing per 1 token
17         "image": "0", // pricing per 1 image
18         "request": "0", // pricing per 1 request
19         "input_cache_read": "0" // pricing per 1 token
20       },
21       "supported_sampling_parameters": ["temperature", "stop"],
22       "supported_features": [
23         "tools",
24         "json_mode",
25         "structured_outputs",
26         "web_search",
27         "reasoning"
28       ],
29       // Optional
30       "description": "Anthropic's flagship model...",
31       "deprecation_date": "2025-06-01T15:00:00Z", // ISO 8601 date or UTC hour
32       "is_ready": true, // false to keep the model staged-but-hidden on OpenRouter
33       "is_free": false, // true to mark as a free endpoint
34       "discount_to_user": 0, // fractional discount on user-facing pricing (0 = none)
35       "capacity_tpm": 1000000, // input tokens per minute capacity for this model (optional)
36       "openrouter": {
37         "slug": "anthropic/claude-sonnet-4"
38       },
39       "datacenters": [
40         {
41           "country_code": "US" // `Iso3166Alpha2Code`
42         }
43       ]
44     }
45   ]
46 }

The id field should be the exact model identifier that OpenRouter will use when calling your API.

The pricing fields are in string format to avoid floating point precision issues, and must be in USD.

Valid input modalities are: text, image, file, audio, video.

Valid output modalities are: text, image, embeddings, audio, video, rerank, speech, transcription.

Valid quantization values are: int4, int8, fp4, fp6, fp8, fp16, bf16, fp32.

Valid sampling parameters are: temperature, top_p, top_k, min_p, top_a, frequency_penalty, presence_penalty, repetition_penalty, stop, seed, max_tokens, logit_bias.

Valid features are: tools, json_mode, structured_outputs, logprobs, web_search, reasoning.

Tiered Pricing

For models with different pricing based on context length (e.g., long context pricing), you can provide pricing as an array of tiers instead of a single object:

1 {
2   "pricing": [
3     {
4       "prompt": "0.000002", // base tier pricing per 1 token
5       "completion": "0.000012", // base tier pricing per 1 token
6       "image": "0.01", // pricing per 1 image (base tier only)
7       "request": "0", // pricing per 1 request (base tier only)
8       "input_cache_read": "0.000001" // base tier pricing per 1 token
9     },
10     {
11       "prompt": "0.000004", // long context tier pricing per 1 token
12       "completion": "0.000018", // long context tier pricing per 1 token
13       "input_cache_read": "0.000002", // long context tier pricing per 1 token
14       "min_context": 200000 // minimum input tokens for this tier to apply
15     }
16   ]
17 }

When using tiered pricing, the first tier (index 0) is the base pricing that applies when input tokens are below the min_context threshold. The second tier applies when input tokens meet or exceed the min_context value.

Limitations:

Currently, OpenRouter supports up to 2 pricing tiers.
The image and request fields are only supported in the base tier (index 0) and will be ignored if included in other tiers.

Discounts with `discount_to_user`

To offer a discount on the prices users see and pay, include the optional discount_to_user field. It’s a decimal fraction that OpenRouter applies to your displayed pricing:

user price = base price × (1 - discount_to_user)

1 {
2   "id": "your-org/your-model",
3   "pricing": {
4     "prompt": "0.000008",
5     "completion": "0.000024"
6   },
7   "discount_to_user": 0.2
8 }

Behavior:

0.2 means users see and pay 20% less than your listed pricing. The example completion price of 0.000024 displays as 0.0000192.
The discount applies to every priced SKU (prompt, completion, image, cache reads, and so on) and to both flat and tiered pricing.
0, an omitted field, or an absent field all mean no discount.
A negative value applies a markup instead of a discount, so -0.1 shows prices 10% higher.
A value of 1 or higher would make the model free (or negative-priced), which isn’t a valid discount. We treat it as a misconfiguration and fall back to no discount (0), so use a value below 1.

Send discount_to_user as a number, not a string. Unlike the pricing fields, it isn’t quoted.

Deprecation Date

If a model is scheduled for deprecation, include the deprecation_date field in ISO 8601 format. OpenRouter accepts either a date-only value or a specific UTC hour:

1 {
2   "id": "anthropic/claude-2.1",
3   "deprecation_date": "2025-06-01"
4 }

Use YYYY-MM-DD for date-only deprecations. Date-only values default to 13:00 UTC on that date.
Use YYYY-MM-DDTHH:00:00Z to request a specific UTC hour, for example 2025-06-01T15:00:00Z.

When OpenRouter’s provider monitor detects a deprecation date or time, it will automatically update the endpoint to display deprecation warnings to users. Models past their deprecation time may be automatically hidden from the marketplace.

Controlling Launch with `is_ready`

By default, when OpenRouter’s provider monitor sees a new model in your /v1/models response, it auto-stages the endpoint, runs baseline tests, and unhides it (makes it live) once the tests pass and pricing is configured. If you need to upload a model ahead of an announcement — or temporarily take a model offline — set the optional boolean is_ready field:

1 {
2   "id": "your-org/upcoming-model",
3   "is_ready": false
4 }

Behavior:

is_ready: false skips baseline tests for newly-staged endpoints, keeping them hidden, and auto-hides any matching endpoint that is currently live. Use this to upload a model in advance of launch, or to take a live model offline coordinated with us.
is_ready: true and an omitted/absent field both preserve the default auto-stage and auto-unhide behavior.

Free Model Variants with `is_free`

If you want to offer a free version of a model, set is_free: true:

1 {
2   "id": "your-org/your-model",
3   "is_free": true
4 }

Behavior:

is_free: true marks the endpoint as a free endpoint (:free suffix).
Any upstream pricing sent alongside is_free: true is ignored — free endpoints always have zero cost.
is_free: false or an omitted field preserves the default behavior (standard paid variant).

Capacity with `capacity_tpm`

Report your per-model throughput capacity so OpenRouter can make better routing and capacity-planning decisions. The value is in input tokens per minute:

1 {
2   "id": "your-org/your-model",
3   "capacity_tpm": 5000000
4 }

The value is an integer representing the input tokens per minute your infrastructure can process for this model (i.e. prompt/input throughput, not output generation).
Omitting the field or setting it to null leaves the capacity unknown (the default).
OpenRouter’s provider monitor auto-applies capacity changes when they appear in your /v1/models response.

2. Auto Top Up or Invoicing

For OpenRouter to use the provider we must be able to pay for inference automatically. This can be done via auto top up or invoicing.

3. Uptime Monitoring & Traffic Routing

OpenRouter automatically monitors provider reliability and adjusts traffic routing based on uptime metrics. Your endpoint’s uptime is calculated as: successful requests ÷ total requests (excluding user errors).

Errors that affect your uptime:

Authentication issues (401)
Payment failures (402)
Model not found (404)
All server errors (500+)
Mid-stream errors
Successful requests with error finish reasons

Errors that DON’T affect uptime:

Bad requests (400) - user input errors
Oversized payloads (413) - user input errors
Rate limiting (429) - tracked separately
Geographic restrictions (403) - tracked separately

Traffic routing thresholds:

Minimum data: 100+ requests required before uptime calculation begins
Normal routing: 95%+ uptime
Degraded status: 80-94% uptime → receives lower priority
Down status: <80% uptime → only used as fallback

This system ensures traffic automatically flows to the most reliable providers while giving temporary issues time to resolve.

4. Performance Metrics

OpenRouter publicly tracks TTFT (time to first token) and throughput (tokens/second) for all providers on each model page.

Throughput is calculated as: output tokens ÷ generation time, where generation time includes fetch latency (time from request to first server response), TTFT, and streaming time. This means any queueing on your end will show up in your throughput metrics.

To keep your metrics competitive:

Return early 429s if under load, rather than queueing requests
Stream tokens as soon as they’re available
If processing takes time (e.g. reasoning models), send SSE comments as keep-alives so we know you’re still working on the request. Otherwise we may cancel with a fetch timeout and fallback to another provider

5. Auto Exacto: Tool-Calling Traffic Routing

Auto Exacto is a routing step that automatically reorders providers for all requests that include tools. It runs by default on every tool-calling request and may change how much tool-calling traffic your endpoints receive.

How traffic is affected

Auto Exacto shifts tool-calling traffic toward providers that perform well on tool-use quality signals. Providers with strong metrics are moved to the front of the routing order and will receive more tool-calling requests, while providers with weaker signals are deprioritized and will see less.

Non-tool-calling traffic is not affected by Auto Exacto — it continues to follow the standard price-weighted routing.

How ranking factors are determined

Auto Exacto uses three classes of signals, all derived from real traffic and evaluations on your endpoints:

Throughput — real-time tokens-per-second measured from actual requests routed through your endpoint (visible on the Performance tab of any model page).
Tool-calling success rate — how reliably your endpoint completes tool calls without errors (also visible on the Performance tab).
Benchmark data — results from internal evaluations we run against provider endpoints. We are actively collecting this data and will make it available in your provider dashboard soon so you can review and run the same benchmarks on your end.

These are the same metrics available in your provider dashboard. Once onboarded, our team can give you access to it.

How deprioritization thresholds work

For each model, we compare every provider’s signal values against the group of providers serving that model. We use a median + MAD (median absolute deviation) approach rather than simple averages, which keeps thresholds stable even when one provider is a significant outlier.

Each signal has a different sensitivity:

Benchmark accuracy — providers falling more than 1 standard deviation below the median are deprioritized. This is the tightest threshold because benchmark scores cluster closely and small differences are meaningful.
Throughput — providers falling more than 1.5 standard deviations below the median are deprioritized. The wider margin accounts for natural throughput variance caused by time-of-day load patterns.
Tool-calling success rate — providers falling more than 2 standard deviations below the median are deprioritized. Success rates cluster near 100%, so this wider margin avoids penalizing normal noise while catching genuinely broken endpoints.

A minimum of 4 providers serving the same model is required before statistical thresholds are computed. Below that count, no deprioritization is applied for that signal.

Endpoints are placed into one of three tiers:

Good — sufficient data and no signals below threshold. These receive top routing priority.
Insufficient data — not enough recent traffic to evaluate. These sort behind known-good providers but ahead of deprioritized ones. An endpoint needs at least 100 general requests (30-minute window) and 200 tool-call requests (2-hour window) before it can be evaluated.
Deprioritized — one or more signals fell below threshold. These are routed to last.

Consistent rate limiting (429s) can reduce the volume of successful requests available for evaluation, making it harder for us to collect enough benchmark data to place your endpoint in the top tier. Returning early 429s is still preferred over queueing, but minimizing rate limits where possible helps ensure your endpoint has sufficient data for a fair evaluation.

How to improve your ranking

To maximize the tool-calling traffic routed to your endpoints:

Maintain high tool-call reliability — ensure your endpoint returns well-formed tool call responses consistently.
Optimize throughput — minimize queueing and stream tokens as soon as they are available (see Performance Metrics above).
Return early 429s under load — rather than queueing and degrading throughput, return rate limit errors so we can retry with another provider and your metrics stay healthy.

For the full user-facing documentation on Auto Exacto, see Auto Exacto.

For Providers

1. List Models Endpoint

Tiered Pricing

Discounts with discount_to_user

Deprecation Date

Controlling Launch with is_ready

Free Model Variants with is_free

Capacity with capacity_tpm

2. Auto Top Up or Invoicing

3. Uptime Monitoring & Traffic Routing

4. Performance Metrics

5. Auto Exacto: Tool-Calling Traffic Routing

How traffic is affected

How ranking factors are determined

How deprioritization thresholds work

How to improve your ranking

Discounts with `discount_to_user`

Controlling Launch with `is_ready`

Free Model Variants with `is_free`

Capacity with `capacity_tpm`