OneLLM.dev API Documentation

Welcome to the official API documentation for OneLLM.dev. This document provides a comprehensive guide to interacting with our API. For a formal definition of the API, please refer to the OpenAPI Specification.

Authentication

The OneLLM.dev API uses API keys for authentication. You can obtain your API key from the OneLLM dashboard.

All API requests must include an Authorization header with a Bearer token containing your API key.

Authorization: Bearer YOUR_API_KEY

API Endpoint

Do note that streaming is not supported

POST https://onellm.dev/api

This is the primary endpoint for interacting with the language models. It allows you to send a chat conversation and receive a response from the specified model.

Request Body

The request body must be a JSON object containing the details of your request.

Parameter Type Description
model string Required. The ID of the model to use for the completion. See the Supported Models section for a list of available models.
messages array Required. An array of message objects representing the conversation history.
temperature number Optional. Controls randomness. A lower value makes the model more deterministic. Range: 0.0 to 2.0.
max_tokens integer Optional. The maximum number of tokens to generate in the response. If the value is too large, it will be automatically adjusted based on your account balance.
stream boolean Optional. If set to true, the response will be streamed as server-sent events. Defaults to false.
top_p number Optional. The nucleus sampling probability. The model considers the results of the tokens with top_p probability mass.
stop_sequences array Optional. A list of strings that will cause the model to stop generating tokens.
...and more For a complete list of all possible request parameters, please refer to the OpenAPI Specification.

Example Request:

{
  "model": "GPT-4.1",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a joke about computers."
    }
  ],
  "max_tokens": 50
}

Response Body

The response will be a JSON object containing the model's output.

Parameter Type Description
provider string The name of the underlying model provider (e.g., `openai`).
model string The model that was used for the completion.
role string The role of the message author, typically `assistant`.
content string The content of the message generated by the model.
usage object An object containing token usage information for the request.
finish_reason string The reason the model stopped generating tokens (e.g., `stop`).

Example Response:

{
  "provider": "openai",
  "model": "GPT-4.1",
  "role": "assistant",
  "content": "Why did the computer show up at work late? It had a hard drive!",
  "usage": {
    "input_tokens": 15,
    "output_tokens": 12,
    "total_tokens": 27
  },
  "finish_reason": "stop"
}

Supported Models

The following models are supported and can be used in the model parameter of your API requests:

  • GPT-5
  • GPT-5-Mini
  • GPT-5-Nano
  • GPT-5-Chat-Latest
  • GPT-4.1
  • GPT-4.1-Mini
  • GPT-4.1-Nano
  • GPT-o3
  • GPT-o3-pro
  • GPT-o3-DeepResearch
  • GPT-o3-Mini
  • GPT-o4-mini
  • GPT-4o
  • GPT-4o-mini
  • GPT-o1
  • GPT-o1-Mini
  • Opus-4
  • Sonnet-4
  • Haiku-3.5
  • Opus-3
  • Sonnet-3.7
  • Haiku-3
  • DeepSeek-Reasoner
  • DeepSeek-Chat
  • 2.5-Flash-preview
  • 2.5-Pro-preview
  • 2.0-Flash
  • 2.0-Flash-lite
  • 1.5-Flash
  • 1.5-Flash-8B
  • 1.5-Pro
  • Mistral-Medium-3
  • Magistral-Medium
  • Codestral
  • Devstral-Medium
  • Mistral-Saba
  • Mistral-Large
  • Pixtral-Large
  • Ministral-8B-24.10
  • Ministral-3B-24.10
  • Mistral-Small-3.2
  • Magistral-Small
  • Devstral-Small
  • Pixtral-12B
  • Mistral-NeMo
  • Mistral-7B
  • Mixtral-8x7B
  • Mixtral-8x22B

Important Notes

  • If the max_tokens field's value is too large, the server will automatically set it to the highest amount that your balance allows.
  • A minimum balance of USD $0.10 is required to make API requests.