Kimi K2.5 API: Access Moonshot AI’s Kimi Models
Kimi K2.5 is Moonshot AI’s flagship model: a 1-trillion-parameter mixture-of-experts architecture that activates only 32 billion parameters per request, delivering top-tier reasoning, visual coding, and agentic tool-calling at a fraction of the compute cost of a dense model its size. Through AIsa, you access Kimi K2.5 at approximately 80% of Moonshot AI’s official pricing, under a formal enterprise data agreement that guarantees your data is never retained or stored by Moonshot after processing. One AIsa key. No Moonshot account required.Kimi K2.5 at a glance
| Spec | Value |
|---|---|
| Total parameters | 1 trillion |
| Active parameters per request | 32 billion |
| Architecture | Mixture-of-Experts (MoE) |
| Context window | 256,000 tokens |
| Release date | January 27, 2026 |
| Input pricing (via AIsa) | ~0.60/M official) |
| Output pricing (via AIsa) | ~2.50/M official) |
| Cache hit pricing | $0.10/M input tokens |
See aisa.one/models for exact current rates.
Quickstart
Python
Node.js
Streaming
Why Kimi K2.5?
Mixture-of-Experts efficiency
The 1T/32B MoE architecture means Kimi K2.5 has the knowledge capacity of a 1-trillion-parameter model but the inference speed and cost of a 32-billion-parameter model. Only ~3% of the network activates per token, which translates directly to faster responses and lower cost compared to a dense model of equivalent capability.Built for agents
Kimi K2.5 was specifically designed for agentic use cases. It supports:- Tool calling — natively compatible with OpenAI function calling schema
- JSON mode — structured output for downstream parsing
- Partial mode — stream structured data before the full response completes
- Internet search — built-in search capability (available through AIsa’s web search tools)
- Extended context — 256K tokens with automatic caching for repeated prefixes
Visual coding
Kimi K2.5 excels at reading and reasoning about visual content: UI screenshots, architecture diagrams, database schemas, and code rendered as images. This makes it particularly powerful for:- Reviewing UI/UX mockups and generating the corresponding code
- Describing and debugging rendered output from data visualisations
- Extracting structured data from screenshots of tables or dashboards
Agentic tool calling
Kimi K2.5 handles complex multi-step agentic workflows reliably. Here’s an example with multiple tools:JSON mode and structured output
Using Kimi K2.5 with the 256K context window
The 256K context window covers approximately 200,000 words — enough for most novels, large codebases, or extended research documents:Caching: reduce cost on repeated context
Kimi K2.5 supports prompt caching. When the same prefix (e.g., a system prompt, document, or codebase) appears across multiple requests, cache hits cost $0.10/M input instead of the full rate.Enterprise data privacy
AIsa holds a Supplemental Enterprise Service Agreement with Moonshot AI (effective February 10, 2026) with the following guarantees:- Customer data is not retained by Moonshot AI after processing
- Generated outputs are not stored on Moonshot’s infrastructure
- Data is not used for model training or fine-tuning
- Processing occurs within the boundaries of the enterprise agreement
What’s next
- All Chinese AI models — full comparison table
- Qwen models — Alibaba’s 1M-context flagship
- DeepSeek V4 — 81% SWE-bench at frontier-beating price
- ByteDance Seed & Seedream — Seed 1.6, 1.8, Flash, and Seedream 4.5 image generation