Top Coding Models in Kilo This Week
Our picks based on real-world testing • View usage stats
by Anthropic
The most capable model for complex planning and orchestration
by BytePlus
Powerful multimodal model from BytePlus — free to use in Kilo
Top Models by Mode
KiloClaw
| # | Model | % |
|---|---|---|
| 1 | mimo-v2-pro | 42.2% |
| 2 | minimax-m2.5 | 31.1% |
| 3 | qwen3.6-plus | 4.6% |
| 4 | claude-opus-4.6 | 4.2% |
| 5 | nemotron-3-super-120b-a12b | 3.3% |
| 6 | kimi-k2.5 | 2.5% |
| 7 | minimax-m2.7 | 2.5% |
| 8 | mimo-v2-omni | 2.2% |
| 9 | claude-sonnet-4.6 | 1.2% |
| 10 | step-3.5-flash | 1.2% |
Code
| # | Model | % |
|---|---|---|
| 1 | mimo-v2-pro | 35.0% |
| 2 | minimax-m2.5 | 33.9% |
| 3 | qwen3.6-plus | 5.7% |
| 4 | grok-code-fast-1 | 5.0% |
| 5 | minimax-m2.7 | 3.9% |
| 6 | qwen3.6-plus-preview | 3.4% |
| 7 | claude-sonnet-4.6 | 2.8% |
| 8 | nemotron-3-super-120b-a12b | 2.6% |
| 9 | claude-opus-4.6 | 1.8% |
| 10 | mimo-v2-omni | 1.2% |
Plan
| # | Model | % |
|---|---|---|
| 1 | mimo-v2-pro | 40.1% |
| 2 | minimax-m2.5 | 20.5% |
| 3 | qwen3.6-plus | 9.5% |
| 4 | claude-opus-4.6 | 4.8% |
| 5 | qwen3.6-plus-preview | 4.5% |
| 6 | nemotron-3-super-120b-a12b | 4.3% |
| 7 | grok-code-fast-1 | 3.6% |
| 8 | kimi-k2.5 | 2.7% |
| 9 | claude-sonnet-4.6 | 2.5% |
| 10 | minimax-m2.7 | 1.7% |
Ask
| # | Model | % |
|---|---|---|
| 1 | mimo-v2-pro | 32.7% |
| 2 | minimax-m2.5 | 29.5% |
| 3 | grok-code-fast-1 | 10.0% |
| 4 | claude-opus-4.6 | 4.0% |
| 5 | qwen3.6-plus | 3.8% |
| 6 | kimi-k2.5 | 3.6% |
| 7 | nemotron-3-super-120b-a12b | 2.7% |
| 8 | claude-sonnet-4.6 | 2.5% |
| 9 | qwen3.6-plus-preview | 1.9% |
| 10 | minimax-m2.7 | 1.8% |
Debug
| # | Model | % |
|---|---|---|
| 1 | mimo-v2-pro | 34.3% |
| 2 | minimax-m2.5 | 32.0% |
| 3 | qwen3.6-plus | 5.0% |
| 4 | grok-code-fast-1 | 5.0% |
| 5 | qwen3.6-plus-preview | 3.7% |
| 6 | nemotron-3-super-120b-a12b | 3.4% |
| 7 | kimi-k2.5 | 3.4% |
| 8 | minimax-m2.7 | 2.5% |
| 9 | claude-opus-4.6 | 2.3% |
| 10 | claude-sonnet-4.6 | 2.2% |
Review
| # | Model | % |
|---|---|---|
| 1 | mimo-v2-pro | 52.7% |
| 2 | minimax-m2.5 | 22.9% |
| 3 | minimax-m2.7 | 6.6% |
| 4 | nemotron-3-super-120b-a12b | 4.6% |
| 5 | grok-code-fast-1 | 2.2% |
| 6 | claude-opus-4.6 | 2.0% |
| 7 | qwen3.6-plus-preview | 1.8% |
| 8 | claude-sonnet-4.6 | 1.5% |
| 9 | kimi-k2.5 | 1.0% |
| 10 | qwen3.6-plus | 0.7% |
Top Models Today
- 1.minimax-m2.599.2B
- 2.qwen3.6-plus50.0B
- 3.grok-code-fast-119.4B
- 4.minimax-m2.77.0B
- 5.nemotron-3-super-120b-a12b5.8B
- 6.claude-opus-4.65.3B
- 7.dola-seed-2.0-pro5.3B
- 8.claude-sonnet-4.64.2B
- 9.step-3.5-flash3.3B
- 10.kimi-k2.51.8B
Daily Top Models
All Models
Browse and compare all available AI coding models
MiniMax: MiniMax M2.5 (free)
minimax
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
Qwen: Qwen3.6 Plus Preview (free)
qwen
Qwen 3.6 Plus Preview is the next-generation evolution of the Qwen Plus series, featuring an advanced hybrid architecture that improves efficiency and scalability. It delivers stronger reasoning and more reliable agentic behavior compared to the 3.5 series. In benchmarks, it performs at or above leading state-of-the-art models. Designed as a flagship preview, it excels in agentic coding, front-end development, and complex problem-solving. Note: The model collects prompt and completion data that can be used to improve the model.
Qwen: Qwen3.6 Plus (free)
qwen
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...
Anthropic: Claude Sonnet 4.5
anthropic
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...
NVIDIA: Nemotron 3 Super (free)
nvidia
NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...
Anthropic: Claude Opus 4.5
anthropic
Claude Opus 4.5 is Anthropic’s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and...
OpenAI: GPT-5.4
openai
GPT-5.4 is OpenAI’s latest frontier model, unifying the Codex and GPT lines into a single system. It features a 1M+ token context window (922K input, 128K output) with support for...
Qwen: Qwen3 Coder Plus
Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and...
OpenAI: GPT-5.3-Codex
GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results...
Google: Gemini 3 Flash Preview
Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...
MoonshotAI: Kimi K2.5
moonshotai
Kimi K2.5 is Moonshot AI's native multimodal model, delivering state-of-the-art visual coding capability and a self-directed agent swarm paradigm. Built on Kimi K2 with continued pretraining over approximately 15T mixed...
OpenAI: GPT-5.2
openai
GPT-5.2 is the latest frontier-grade model in the GPT-5 series, offering stronger agentic and long context perfomance compared to GPT-5.1. It uses adaptive reasoning to allocate computation dynamically, responding quickly...
Anthropic: Claude Sonnet 4.6
Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with...
Z.ai: GLM 4.6
z-ai
Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...
Anthropic: Claude Opus 4.6
Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective...
Z.ai: GLM 5
GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading...
xAI: Grok Code Fast 1
x-ai
Grok Code Fast 1 is a speedy and economical reasoning model that excels at agentic coding. With reasoning traces visible in the response, developers can steer Grok Code for high-quality...
Google: Gemini 3.1 Pro Preview
Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...
Google: Gemini 2.5 Flash
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...
OpenAI: GPT-5.1-Codex
openai
GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....
Anthropic: Claude Haiku 4.5
anthropic
Claude Haiku 4.5 is Anthropic’s fastest and most efficient model, delivering near-frontier intelligence at a fraction of the cost and latency of larger Claude models. Matching Claude Sonnet 4’s performance...
Qwen: Qwen3 Coder 480B A35B
qwen
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
OpenAI: GPT-5.2-Codex
openai
GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks....
Arcee AI: Trinity Large Preview (free)
Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing. It excels in creative writing,...
Mistral: Devstral 2 2512
Devstral 2 is a state-of-the-art open-source model by Mistral AI specializing in agentic coding. It is a 123B-parameter dense transformer model supporting a 256K context window. Devstral 2 supports exploring...
MiniMax: MiniMax M2.5
MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
DeepSeek: DeepSeek V3.1 Terminus
DeepSeek-V3.1 Terminus is an update to [DeepSeek V3.1](/deepseek/deepseek-chat-v3.1) that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's...
Google: Gemini 3 Pro Preview
Gemini 3 Pro is Google’s flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks. It delivers state-of-the-art benchmark results in general reasoning, STEM problem solving, factual QA, and multimodal understanding, including leading scores on LMArena, GPQA Diamond, MathArena Apex, MMMU-Pro, and Video-MMMU. Interactions emphasize depth and interpretability: the model is designed to infer intent with minimal prompting and produce direct, insight-focused responses. Built for advanced development and agentic workflows, Gemini 3 Pro provides robust tool-calling, long-horizon planning stability, and strong zero-shot generation for complex UI, visualization, and coding tasks. It excels at agentic coding (SWE-Bench Verified, Terminal-Bench 2.0), multimodal analysis, and structured long-form tasks such as research synthesis, planning, and interactive learning experiences. Suitable applications include autonomous agents, coding assistants, multimodal analytics, scientific reasoning, and high-context information processing.
Kwaipilot: KAT-Coder-Pro V1
kwaipilot
KAT-Coder-Pro V1 is KwaiKAT's most advanced agentic coding model in the KAT-Coder series. Designed specifically for agentic coding tasks, it excels in real-world software engineering scenarios, achieving 73.4% solve rate on the SWE-Bench Verified benchmark. The model has been optimized for tool-use capability, multi-turn interaction, instruction following, generalization, and comprehensive capabilities through a multi-stage training process, including mid-training, supervised fine-tuning (SFT), reinforcement fine-tuning (RFT), and scalable agentic RL.
MiniMax: MiniMax M2.1
minimax
MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world...
MiniMax: MiniMax M2.7
MiniMax-M2.7 is a next-generation large language model designed for autonomous, real-world productivity and continuous improvement. Built to actively participate in its own evolution, M2.7 integrates advanced agentic capabilities through multi-agent...
MoonshotAI: Kimi K2 0905
Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...
OpenAI: GPT-5.1
openai
GPT-5.1 is the latest frontier-grade model in the GPT-5 series, offering stronger general-purpose reasoning, improved instruction adherence, and a more natural conversational style compared to GPT-5. It uses adaptive reasoning...
Qwen: Qwen3 Coder Next
Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...
Z.ai: GLM 4.7
z-ai
GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while...
Z.ai: GLM 4.7 Flash
z-ai
As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning,...
‡ Real-world usage from Kilo Code Leaderboard
* Performance metrics from Artificial Analysis
†Pricing from OpenRouter
Recent posts
Read the latest news and updates from the Kilo Code team.