Generative APIs

Serve the latest AI models via API, pay by million token

OpenAI-compatible APIs

Easily integrate with existing tools like OpenAI libraries and LangChain SDKs. Our APIs are designed to work out-of-the-box with your existing workflows, including adapters for Retrieval-Augmented Generation (RAG).

Cost-effective usage

Optimize your budget with a pay-per-use model, billed per million tokens. Benefit from additional discount for non-realtime use cases through Batches APIs.

Quick model testing

Start serving and testing AI models in just a few minutes. Our streamlined onboarding process and serverless architecture let you deploy endpoints instantly, enabling rapid iteration and minimal setup time.

Towards a sovereign AI where your data remains yours, and only in Europe.

Security and privacy for your data and applications

We do not collect, read, reuse, or analyse the content of your inputs, prompts or outputs generated by the APIs. Your business is yours and has nothing to do with Scaleway’s

Read the Privacy Policy

Read our Privacy Policy

Everything you need to create apps with Generative AI

: With Retrieval-Augmented Generation (RAG) – a technique that involves retrieving data from enterprise data sources– you can enrich your AI model with private and up-to-date information for more relevant and accurate answers.

RAG is easy with Scaleway: embeddings, vector database, Langchain: here's your step by step guide.

: An agent actively performs tasks to achieve a specified outcome. When connected via APIs, it can interact with systems to execute actions. Generative APIs enable models to handle multi-step tasks using organizational data, like answering customer inquiries or processing bookings (thanks to Serverless Functions). An autonomous agent interprets user requests and autonomously triggers APIs and databases to complete tasks.

: Create LLM-based, multimodal assistants (copilot, chatbot etc), that understand user requests, automatically break down tasks, engage in dialogue to gather information, and boost productivity over so many tasks. Your virtual assistant can now translate languages, summarize content, analyze sentiment, answer questions, ect…

: Traditional OCR models struggle with tasks that require understanding both text and visuals, but the multimodal vision-language models (VLMs) available through Scaleway Generative APIs bridge this gap. VLMs are ideal for real-world applications like scanned documents and technical diagrams, making them a powerful toolkit for mixed-content processing.

: Analyze call/video recordings securely in order to identify needs. Combined with powerful LLMs the upcoming speech-to-text capability will enable telecom giants to improve quality of services while providing agents with highly valuable insights.

Models' prices

Enjoy a free tier of 1,000,000 tokens. Every new customer gets 1,000,000 free tokens—start paying only from the 1,000,001st token.

qwen3.5-397b-a17b	Chat, Code and Vision	€0.60 /^{million tokens}	€3.60 /^{million tokens}
qwen3-235b-a22b-instruct-2507	Chat	€0.75 /^{million tokens}	€2.25 /^{million tokens}
gpt-oss-120b	Chat	€0.15 /^{million tokens}	€0.60 /^{million tokens}
gemma-3-27b-it	Chat and Vision	€0.25 /^{million tokens}	€0.50 /^{million tokens}
whisper-large-v3	Audio transcription	€0.003 /^{Audio minute}	Free
holo2-30b-a3b	Chat and Vision	€0.30 /^{million tokens}	€0.70 /^{million tokens}
voxtral-small-24b-2507	Audio transcription and Chat	€0.15 /^{million tokens}	€0.35 /^{million tokens}
mistral-small-3.2-24b-instruct-2506	Chat and Vision	€0.15 /^{million tokens}	€0.35 /^{million tokens}
llama-3.3-70b-instruct	Chat	€0.90 /^{million tokens}	€0.90 /^{million tokens}
deepseek-r1-distill-llama-70b	Chat	€0.90 /^{million tokens}	€0.90 /^{million tokens}
qwen3-embedding-8b	Embeddings	€0.10 /^{million tokens}	Free
qwen3-coder-30b-a3b-instruct	Chat	€0.20 /^{million tokens}	€0.80 /^{million tokens}
pixtral-12b-2409	Chat and Vision	€0.20 /^{million tokens}	€0.20 /^{million tokens}
mistral-nemo-instruct-2407	Chat	€0.20 /^{million tokens}	€0.20 /^{million tokens}
bge-multilingual-gemma2	Embeddings	€0.10 /^{million tokens}	Free
llama-3.1-8b-instruct	Chat	€0.20 /^{million tokens}	€0.20 /^{million tokens}

Evaluate the cost of your Generative API

Compare the Generative API with Managed Inference based on your request volume and token usage.

RAG

Autonomous Agents

LLM-based Assistant

Boosted OCR

Audio transcription