OneLLM.dev API Documentation

Welcome to the official API documentation for OneLLM.dev. This document provides a comprehensive guide to interacting with our API. For a formal definition of the API, please refer to the OpenAPI Specification.

Authentication

The OneLLM.dev API uses API keys for authentication. You can obtain your API key from the OneLLM dashboard.

All API requests must include an Authorization header with a Bearer token containing your API key.

Authorization: Bearer YOUR_API_KEY

API Endpoint

Do note that streaming is not supported

`POST https://onellm.dev/api`

This is the primary endpoint for interacting with the language models. It allows you to send a chat conversation and receive a response from the specified model.

Request Body

The request body must be a JSON object containing the details of your request.

Parameter	Type	Description
`model`	string	Required. The ID of the model to use for the completion. See the Supported Models section for a list of available models.
`messages`	array	Required. An array of message objects representing the conversation history.
`temperature`	number	Optional. Controls randomness. A lower value makes the model more deterministic. Range: 0.0 to 2.0.
`max_tokens`	integer	Optional. The maximum number of tokens to generate in the response. If the value is too large, it will be automatically adjusted based on your account balance.
`stream`	boolean	Optional. If set to `true`, the response will be streamed as server-sent events. Defaults to `false`.
`top_p`	number	Optional. The nucleus sampling probability. The model considers the results of the tokens with `top_p` probability mass.
`stop_sequences`	array	Optional. A list of strings that will cause the model to stop generating tokens.
`...and more`		For a complete list of all possible request parameters, please refer to the OpenAPI Specification.

Example Request:

{
  "model": "GPT-4.1",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a joke about computers."
    }
  ],
  "max_tokens": 50
}

Response Body

The response will be a JSON object containing the model's output.

Parameter	Type	Description
`provider`	string	The name of the underlying model provider (e.g., `openai`).
`model`	string	The model that was used for the completion.
`role`	string	The role of the message author, typically `assistant`.
`content`	string	The content of the message generated by the model.
`usage`	object	An object containing token usage information for the request.
`finish_reason`	string	The reason the model stopped generating tokens (e.g., `stop`).

Example Response:

{
  "provider": "openai",
  "model": "GPT-4.1",
  "role": "assistant",
  "content": "Why did the computer show up at work late? It had a hard drive!",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 12,
    "total_tokens": 27
  },
  "finish_reason": "stop"
}

Supported Models

The following models are supported and can be used in the model parameter of your API requests:

GPT-5
GPT-5-Mini
GPT-5-Nano
GPT-5-Chat-Latest
GPT-4.1
GPT-4.1-Mini
GPT-4.1-Nano
GPT-o3
GPT-o3-pro
GPT-o3-DeepResearch
GPT-o3-Mini
GPT-o4-mini
GPT-4o
GPT-4o-mini
GPT-o1
GPT-o1-Mini
Opus-4
Sonnet-4
Haiku-3.5
Opus-3
Sonnet-3.7
Haiku-3
DeepSeek-Reasoner
DeepSeek-Chat
2.5-Flash-preview
2.5-Pro-preview
2.0-Flash
2.0-Flash-lite
1.5-Flash
1.5-Flash-8B
1.5-Pro
Mistral-Medium-3
Magistral-Medium
Codestral
Devstral-Medium
Mistral-Saba
Mistral-Large
Pixtral-Large
Ministral-8B-24.10
Ministral-3B-24.10
Mistral-Small-3.2
Magistral-Small
Devstral-Small
Pixtral-12B
Mistral-NeMo
Mistral-7B
Mixtral-8x7B
Mixtral-8x22B

Important Notes

If the max_tokens field's value is too large, the server will automatically set it to the highest amount that your balance allows.
A minimum balance of USD $0.10 is required to make API requests.