Kilo Gateway Leaderboard

#	Model	%
1	mimo-v2-pro	42.2%
2	minimax-m2.5	31.1%
3	qwen3.6-plus	4.6%
4	claude-opus-4.6	4.2%
5	nemotron-3-super-120b-a12b	3.3%
6	kimi-k2.5	2.5%
7	minimax-m2.7	2.5%
8	mimo-v2-omni	2.2%
9	claude-sonnet-4.6	1.2%
10	step-3.5-flash	1.2%

Code

#	Model	%
1	mimo-v2-pro	35.0%
2	minimax-m2.5	33.9%
3	qwen3.6-plus	5.7%
4	grok-code-fast-1	5.0%
5	minimax-m2.7	3.9%
6	qwen3.6-plus-preview	3.4%
7	claude-sonnet-4.6	2.8%
8	nemotron-3-super-120b-a12b	2.6%
9	claude-opus-4.6	1.8%
10	mimo-v2-omni	1.2%

Plan

#	Model	%
1	mimo-v2-pro	40.1%
2	minimax-m2.5	20.5%
3	qwen3.6-plus	9.5%
4	claude-opus-4.6	4.8%
5	qwen3.6-plus-preview	4.5%
6	nemotron-3-super-120b-a12b	4.3%
7	grok-code-fast-1	3.6%
8	kimi-k2.5	2.7%
9	claude-sonnet-4.6	2.5%
10	minimax-m2.7	1.7%

Ask

#	Model	%
1	mimo-v2-pro	32.7%
2	minimax-m2.5	29.5%
3	grok-code-fast-1	10.0%
4	claude-opus-4.6	4.0%
5	qwen3.6-plus	3.8%
6	kimi-k2.5	3.6%
7	nemotron-3-super-120b-a12b	2.7%
8	claude-sonnet-4.6	2.5%
9	qwen3.6-plus-preview	1.9%
10	minimax-m2.7	1.8%

Debug

#	Model	%
1	mimo-v2-pro	34.3%
2	minimax-m2.5	32.0%
3	qwen3.6-plus	5.0%
4	grok-code-fast-1	5.0%
5	qwen3.6-plus-preview	3.7%
6	nemotron-3-super-120b-a12b	3.4%
7	kimi-k2.5	3.4%
8	minimax-m2.7	2.5%
9	claude-opus-4.6	2.3%
10	claude-sonnet-4.6	2.2%

Review

#	Model	%
1	mimo-v2-pro	52.7%
2	minimax-m2.5	22.9%
3	minimax-m2.7	6.6%
4	nemotron-3-super-120b-a12b	4.6%
5	grok-code-fast-1	2.2%
6	claude-opus-4.6	2.0%
7	qwen3.6-plus-preview	1.8%
8	claude-sonnet-4.6	1.5%
9	kimi-k2.5	1.0%
10	qwen3.6-plus	0.7%

View all models →

Top Models Today

View all models →

Daily Top Models

View all models →

All Models

Browse and compare all available AI coding models

MiniMax: MiniMax M2.5 (free)

Recommended

minimax

MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...

#2 code mode ‡$0.00/1M †

Qwen: Qwen3.6 Plus Preview (free)

qwen

Qwen 3.6 Plus Preview is the next-generation evolution of the Qwen Plus series, featuring an advanced hybrid architecture that improves efficiency and scalability. It delivers stronger reasoning and more reliable agentic behavior compared to the 3.5 series. In benchmarks, it performs at or above leading state-of-the-art models. Designed as a flagship preview, it excels in agentic coding, front-end development, and complex problem-solving. Note: The model collects prompt and completion data that can be used to improve the model.

#4 code mode ‡$0.00/1M †

Qwen: Qwen3.6 Plus (free)

qwen

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

#6 code mode ‡$0.00/1M †

Anthropic: Claude Sonnet 4.5

anthropic

Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...

#14 code mode ‡38.6 coding index *$3.00/1M †

NVIDIA: Nemotron 3 Super (free)

nvidia

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

#20 code mode ‡$0.00/1M †

Anthropic: Claude Opus 4.5

anthropic

Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and...

#22 code mode ‡42.9 coding index *$5.00/1M †

OpenAI: GPT-5.4

Recommended

openai

GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...

#29 code mode ‡$2.50/1M †

Qwen: Qwen3 Coder Plus

Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and...

#33 code mode ‡24.6 coding index *$0.65/1M †

OpenAI: GPT-5.3-Codex

GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...

#36 code mode ‡$1.75/1M †

Google: Gemini 3 Flash Preview

google

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

#37 code mode ‡$0.50/1M †

MoonshotAI: Kimi K2.5

Recommended

moonshotai

Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...

#40 code mode ‡$0.38/1M †

OpenAI: GPT-5.2

openai

GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...

#42 code mode ‡$1.75/1M †

Anthropic: Claude Sonnet 4.6

Recommended

Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...

#48 code mode ‡$3.00/1M †

Z.ai: GLM 4.6

z-ai

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...

#49 code mode ‡29.5 coding index *$0.39/1M †

Anthropic: Claude Opus 4.6

Recommended

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...

#51 code mode ‡47.6 coding index *$5.00/1M †

Z.ai: GLM 5

Recommended

GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...

#55 code mode ‡$0.72/1M †

xAI: Grok Code Fast 1

x-ai

Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality...

#57 code mode ‡23.7 coding index *$0.20/1M †

Google: Gemini 3.1 Pro Preview

Recommended

google

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

#58 code mode ‡$2.00/1M †

Google: Gemini 2.5 Flash

google

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

#65 code mode ‡24.6 coding index *$0.30/1M †

OpenAI: GPT-5.1-Codex

openai

GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

#69 code mode ‡38.9 coding index *$1.25/1M †

Anthropic: Claude Haiku 4.5

anthropic

Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...

#72 code mode ‡29.6 coding index *$1.00/1M †

Qwen: Qwen3 Coder 480B A35B

qwen

Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...

#74 code mode ‡24.6 coding index *$0.22/1M †

OpenAI: GPT-5.2-Codex

openai

GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....

#76 code mode ‡$1.75/1M †

Arcee AI: Trinity Large Preview (free)

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It excels in creative writing,...

#90 code mode ‡$0.00/1M †

Mistral: Devstral 2 2512

Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring...

#93 code mode ‡$0.40/1M †

MiniMax: MiniMax M2.5

#94 code mode ‡$0.12/1M †

DeepSeek: DeepSeek V3.1 Terminus

DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's...

33.7 coding index *$0.21/1M †

Google: Gemini 3 Pro Preview

google

Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.

46.5 coding index *$2.00/1M †

Kwaipilot: KAT-Coder-Pro V1

kwaipilot

KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.

$0.21/1M †

MiniMax: MiniMax M2.1

minimax

MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...

$0.27/1M †

MiniMax: MiniMax M2.7

Recommended

MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...

$0.30/1M †

MoonshotAI: Kimi K2 0905

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...

25.9 coding index *$0.40/1M †

OpenAI: GPT-5.1

openai

GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning...

44.7 coding index *$1.25/1M †

Qwen: Qwen3 Coder Next

Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...

$0.12/1M †

Z.ai: GLM 4.7

z-ai

GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while...

$0.39/1M †

Z.ai: GLM 4.7 Flash

z-ai

As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning,...

$0.06/1M †

‡ Real-world usage from Kilo Code Leaderboard

* Performance metrics from Artificial Analysis

† Pricing from OpenRouter

View all models →

Kilo Gateway Leaderboard

Top Coding Models in Kilo This Week

Top Models by Mode

KiloClaw

Code

Plan

Ask

Debug

Review

Top Models Today

Daily Top Models

All Models

MiniMax: MiniMax M2.5 (free)

Qwen: Qwen3.6 Plus Preview (free)

Qwen: Qwen3.6 Plus (free)

Anthropic: Claude Sonnet 4.5

NVIDIA: Nemotron 3 Super (free)

Anthropic: Claude Opus 4.5

OpenAI: GPT-5.4

Qwen: Qwen3 Coder Plus

OpenAI: GPT-5.3-Codex

Google: Gemini 3 Flash Preview

MoonshotAI: Kimi K2.5

OpenAI: GPT-5.2

Anthropic: Claude Sonnet 4.6

Z.ai: GLM 4.6

Anthropic: Claude Opus 4.6

Z.ai: GLM 5

xAI: Grok Code Fast 1

Google: Gemini 3.1 Pro Preview

Google: Gemini 2.5 Flash

OpenAI: GPT-5.1-Codex

Anthropic: Claude Haiku 4.5

Qwen: Qwen3 Coder 480B A35B

OpenAI: GPT-5.2-Codex

Arcee AI: Trinity Large Preview (free)

Mistral: Devstral 2 2512

MiniMax: MiniMax M2.5

DeepSeek: DeepSeek V3.1 Terminus

Google: Gemini 3 Pro Preview

Kwaipilot: KAT-Coder-Pro V1

MiniMax: MiniMax M2.1

MiniMax: MiniMax M2.7

MoonshotAI: Kimi K2 0905

OpenAI: GPT-5.1

Qwen: Qwen3 Coder Next

Z.ai: GLM 4.7

Z.ai: GLM 4.7 Flash

Recent posts

Trinity-Large-Thinking is Free in Kilo for a Limited Time

I Was Running OpenClaw With My Claude Max Subscription. Now What?

Dola Seed 2.0 Pro is Here: The Multimodal Leap