Skip to main content

Qwen API: Access Alibaba Cloud’s Qwen Models

AIsa is an Alibaba Cloud Qwen Key Account Partner, giving you production access to the complete Qwen model family — from the 1M-context Qwen3.6 Plus flagship to three specialised coder variants — through a single OpenAI-compatible API key at discounted partner pricing. No Alibaba Cloud account. No Aliyun registration. No RMB billing. One AIsa key covers every Qwen model alongside every other LLM in the AIsa catalogue.

Supported Qwen models

ModelContext windowBest forInput price*Output price*
qwen3.6-plus1,000,000 tokensUltra-long context, frontier reasoning$0.276/M$1.651/M
qwen3-max262,144 tokensBalanced capability and cost$0.72/M$3.60/M
qwen3-coder-plus262,144 tokensCode generation and completion$0.70/M$3.50/M
qwen3-coder-flash262,144 tokensFast code tasks, high-throughput coding$0.21/M$1.05/M
qwen3-coder-480b-a35b-instruct262,144 tokensMaximum coding capability (480B MoE)$1.05/M$5.25/M
  • Prices shown are reference market rates. AIsa Key Account partner pricing may be lower — see aisa.one/models for your actual rate.
Qwen3.6 Plus is currently in preview and available at no cost through AIsa during the preview period.

Quickstart

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_AISA_API_KEY",
    base_url="https://api.aisa.one/v1"
)

# Standard chat completion
response = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Summarise this 500-page contract in bullet points."}
    ]
)
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.AISA_API_KEY,
  baseURL: "https://api.aisa.one/v1",
});

const response = await client.chat.completions.create({
  model: "qwen3-coder-plus",
  messages: [
    { role: "user", content: "Refactor this Python function to be async." }
  ],
});
console.log(response.choices[0].message.content);

Streaming

stream = client.chat.completions.create(
    model="qwen3-max",
    messages=[{"role": "user", "content": "Write a detailed market analysis."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Model guide

Qwen3.6 Plus — 1M context flagship

Qwen3.6 Plus is Alibaba Cloud’s latest frontier model, combining hybrid linear attention with sparse mixture-of-experts routing. Its 1,000,000-token context window makes it practical for entire codebases, book-length documents, or hours-long conversation history. Use when you need:
  • Context windows beyond 128K (legal documents, large repos, research corpora)
  • Strong general reasoning with low latency relative to model capability
  • The absolute latest Qwen release with ongoing improvements
# Processing a large document
with open("annual_report.txt") as f:
    document = f.read()

response = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=[
        {"role": "user", "content": f"Extract all financial metrics from this report:\n\n{document}"}
    ]
)

Qwen3 Max — balanced workhorse

Qwen3 Max offers 262K context with strong performance across reasoning, writing, and multilingual tasks. It hits the sweet spot between capability and cost for production workloads. Use when you need:
  • Reliable general-purpose performance at predictable cost
  • 262K context for most enterprise document tasks
  • Stable, production-hardened model behaviour

Qwen3 Coder — three variants for every coding workload

AIsa supports three Qwen3 Coder variants, each targeting a different point on the speed/capability curve:
ModelBest for
qwen3-coder-plusBest quality-per-dollar for everyday coding tasks
qwen3-coder-flashFast, high-throughput coding — lower latency than Plus
qwen3-coder-480b-a35b-instructMaximum capability — full 480B MoE, activating 35B per request
All three are purpose-built for software engineering: code generation, completion, review, and agentic coding loops.
# Everyday coding — qwen3-coder-plus
response = client.chat.completions.create(
    model="qwen3-coder-plus",
    messages=[
        {"role": "system", "content": "You are an expert software engineer. Write clean, tested code."},
        {"role": "user", "content": "Write a Python class for a rate-limited HTTP client with exponential backoff."}
    ]
)

# Maximum capability — 480B MoE for complex, multi-file tasks
response = client.chat.completions.create(
    model="qwen3-coder-480b-a35b-instruct",
    messages=[
        {"role": "user", "content": "Refactor this entire Express.js app to use async/await throughout, adding proper error boundaries."}
    ]
)

# Fast throughput — qwen3-coder-flash for high-volume coding pipelines
response = client.chat.completions.create(
    model="qwen3-coder-flash",
    messages=[
        {"role": "user", "content": "Add JSDoc comments to this function."}
    ]
)

Switching from OpenAI

If you already use the OpenAI SDK, switching to Qwen takes one line:
# Before
client = OpenAI(api_key="sk-...")

# After — access Qwen (and 49+ other models) through AIsa
client = OpenAI(
    api_key="YOUR_AISA_API_KEY",
    base_url="https://api.aisa.one/v1"
)

# Everything else stays the same
response = client.chat.completions.create(
    model="qwen3-max",   # ← change just this
    messages=[...]
)
All OpenAI SDK features work: streaming, function calling, JSON mode, async clients, and retry logic.

Function calling with Qwen

Qwen3 models support OpenAI-compatible function/tool calling:
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price for a given ticker",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {"type": "string", "description": "Stock ticker symbol, e.g. BABA"},
                },
                "required": ["ticker"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="qwen3-max",
    messages=[{"role": "user", "content": "What is Alibaba's stock price?"}],
    tools=tools,
    tool_choice="auto"
)

Data privacy

All Qwen requests through AIsa are processed under AIsa’s Alibaba Cloud Key Account enterprise agreement. Customer data is not used for model training or shared outside the processing pipeline. For compliance documentation, contact us.

What’s next