Skip to main content

Kimi K2.6

Overview

Kimi K2.6 is Moonshot AI's open-weight multimodal model, released on April 20, 2026. It is the third K2-class model in nine months, following K2 and K2.5. Built on a 1-trillion-parameter Mixture-of-Experts architecture with 32 billion active parameters per token, K2.6 combines native multimodal input, advanced agent swarm orchestration, and strong coding performance.

Key Features

  • Native Multimodal Architecture: Supports text, image, and video input through the custom MoonViT vision encoder. Video input is new in K2.6 and supports mp4, mov, avi, and webm formats.
  • Agent Swarm Orchestration: Supports up to 300 concurrent sub-agents per task and 4,000 coordinated steps, with a 96.6% tool-invocation success rate, up from 91% on K2.5.
  • Coding Performance: Achieves SWE-Bench Pro 58.6%, SWE-bench Verified 80.2%, LiveCodeBench v6 89.6%, and Terminal-Bench 2.0 66.7%.
  • Modified MIT License: Open weights are available on Hugging Face and are free for commercial use below 100M MAU or $20M monthly revenue.

Best Use Cases

  • End-to-End Coding & UI Generation: Well suited for transforming prompts and visual inputs into production-ready interfaces and lightweight full-stack workflows across Python, Rust, and Go.
  • Multi-Agent Systems: The 300-agent swarm capacity with 4,000-step coordination makes it effective for complex autonomous workflows that require long-context stability.
  • Cost-Effective Multimodal Processing: Offers strong multimodal and agentic performance at a lower cost than many proprietary multimodal alternatives.

Capabilities and Limitations

CapabilityDescription
ReasoningAIME 2026: 96.4%, GPQA-Diamond: 90.5%, HLE with tools: 54.0%
CodingSWE-Bench Pro 58.6%, SWE-bench Verified 80.2%, LiveCodeBench v6 89.6%, Terminal-Bench 2.0 66.7%
MultimodalText, image (png, jpeg, webp, gif), and video (mp4, mov, avi, webm) input through the MoonViT vision encoder
Response SpeedOptimized for throughput in agentic workflows; specific tokens-per-second metrics vary by deployment
Context Window262K tokens
Max Output16K tokens, up to 98K in extended mode
Tool Use96.6% tool-invocation success, 4,000+ tool calls per session, and multi-agent handoffs
Multilingual160K vocabulary optimized for code and non-English text; SWE-bench Multilingual 76.7%

Known Limitations

  • Multimodal benchmark performance is weaker than top proprietary models on some vision tasks such as MMMU-Pro and MathVision.
  • URL-based image input is not supported through the API; only base64-encoded content or file upload is supported.
  • Image resolution is capped at 4K, video at 2K, and the full request body must remain under 100MB.
  • Pure math reasoning trails some higher-end proprietary models on benchmarks such as AIME 2026 and GPQA-Diamond.
  • The 262K context window is smaller than some proprietary alternatives offering 1M+ tokens.
  • Independent reviews note only marginal improvement over K2.5 on day-to-day tasks and weaker performance on some domain-specific workloads.

Credits Usage

ModelInput (Credits/Token)Cache Write (Credits/Token)Cache Read (Credits/Token)Output (Credits/Token)Web Search (Credits/Use)Billing Notes
Kimi K2.60.950.950.164.00--