Skip to main content
This page outlines the pricing structure for all AI models available through AISA’s unified LLM inference API. For model specifications (context windows, modalities, and developers), see the supported models catalog. Diagram showing how LLM token-based billing works — input tokens from your prompt and output tokens from the model response are priced separately LLM usage is billed based on token consumption. Each request is charged separately for:
  • Input tokens: the tokens included in your prompt
  • Output tokens: the tokens generated by the model
All prices listed on this page are in USD per 1 million tokens (1M tokens).

How Token-Based Billing Works

Flowchart showing the token cost calculation process: count input tokens, count output tokens, multiply each by the model's per-token rate, and sum for total cost When you send a request to an AI model:
  1. Your prompt is converted into input tokens.
  2. The model generates output tokens.
  3. Both input and output tokens are counted separately.
  4. The total cost is calculated using the model’s pricing.
The billing formula is: Total Cost = (Input tokens ÷ 1,000,000 × Input price) + (Output tokens ÷ 1,000,000 × Output price) For example:
  • If a model charges $1.00 per 1M input tokens
  • And you send 2,000 input tokens
  • The input cost is:
2,000 ÷ 1,000,000 × 1.00 = $0.002 The same calculation applies to output tokens.

What Counts as Tokens?

Tokens represent fragments of text processed by the model. They may include:
  • Words
  • Punctuation
  • Numbers
  • Formatting characters
Longer prompts and longer outputs consume more tokens and therefore increase cost. Streaming responses are billed the same way as non-streaming responses, based on total tokens generated.

Model Versions and Naming

Some models include version identifiers such as:
  • Date-based versions (e.g., -2025-12-11)
  • “thinking” variants
  • “mini” or “flash” variants
These represent distinct models and may have different pricing. If a model is updated or replaced, pricing may differ between versions.

Group-Based Pricing

If your workspace uses multiple groups, pricing may vary by group. Group-level pricing rules and ratios are applied automatically during billing. You can view the final calculated cost for each request in the Usage Logs page.

AI Model Pricing Table

AISA supports multiple types of AI models. Pricing is categorized based on how the model consumes compute:
  • Token-based pricing: used for text and multimodal LLM inference.
  • Media-based pricing: used for image generation and video generation models.
All token-based prices are listed per 1 million tokens (1M tokens). Media models are priced per generated asset or per processing duration.

OpenAI

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
gpt-4.11.40005.6000
gpt-4.1-mini0.28001.1200
gpt-4o1.75007.0000
gpt-4o-mini0.10500.4200
gpt-50.87507.0000
gpt-5-mini0.17501.4000
gpt-5.21.22509.8000
gpt-5.2-2025-12-111.22509.8000
gpt-5.2-chat-latest1.22509.8000
gpt-5.3-codex1.22509.8000
gpt-5.41.750010.5000
gpt-oss-120b0.02800.1330

Anthropic

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
claude-3-7-sonnet-202502193.000015.0000
claude-3-7-sonnet-20250219-thinking3.000015.0000
claude-haiku-4-5-202510011.00005.0000
claude-sonnet-4-202505143.000015.0000
claude-sonnet-4-20250514-thinking3.000015.0000
claude-sonnet-4-5-202509293.000015.0000
claude-sonnet-4-63.000015.0000
claude-sonnet-4-6-thinking3.000015.0000
claude-opus-4-2025051415.000075.0000
claude-opus-4-20250514-thinking15.000075.0000
claude-opus-4-1-2025080515.000075.0000
claude-opus-4-1-20250805-thinking15.000075.0000
claude-opus-4-5-202511015.000025.0000
claude-opus-4-65.000025.0000
claude-opus-4-75.000025.0000

Google

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
gemini-2.5-flash0.21001.7500
gemini-2.5-flash-lite0.07000.2800
gemini-2.5-pro0.87507.0000
gemini-3-pro-preview1.40008.4000
gemini-3.1-pro-preview1.40008.4000

DeepSeek

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
deepseek-r10.40181.6058
deepseek-v3.10.40181.2047
deepseek-v3.20.20090.3017

Qwen (Alibaba)

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
qwen-flash0.02200.1800
qwen-mt-flash0.07200.2200
qwen-mt-lite0.08400.2520
qwen-plus-2025-12-010.28000.8400
qwen3-coder-480b-a35b-instruct1.05005.2500
qwen3-coder-flash0.21001.0500
qwen3-coder-plus0.70003.5000
qwen3-max0.72003.6000
qwen3-vl-flash0.03500.2800
qwen3-vl-flash-2025-10-150.03500.2800
qwen3-vl-plus0.14001.1200
qwen3.6-plus0.27601.6510

Moonshot AI

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
kimi-k2-thinking0.40201.6060
kimi-k2.50.40202.1080

MiniMax

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
MiniMax-M2.50.21000.8400

Zhipu GLM

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
glm-50.40101.8060

ByteDance (Seed)

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
seed-1-6-2509150.22500.9000
seed-1-6-flash-2507150.06800.2700
seed-1-8-2512280.22501.8000
seed-2-0-mini-2602150.10000.4000
seed-2-0-lite-2602280.25002.0000
seed-2-0-pro-2603280.50003.0000

Image & Video Generation Pricing

Some models generate media rather than tokens. These models are billed per asset (pay-per-view).
Model NameProviderPricing
gemini-3-pro-image-previewGoogle$0.100 per image
seedream-4-5-251128ByteDance$0.040 per image
wan2.7-imageQwen$0.030 per image
wan2.7-image-proQwen$0.075 per image
wan2.7-i2vQwen$1.836 per video (i2v)

Important Notes

  • All prices are listed in USD.
  • Text-based models are billed per input and output token.
  • Image generation models are billed per generated image.
  • Video generation models are billed per second of generated video.
  • Pricing is usage-based and calculated per request.
  • Model availability and pricing may change over time.
  • Always refer to the Marketplace for the most up-to-date pricing information.
  • The final billed amount for each request can be verified in Usage Logs.