Documentation Index Fetch the complete documentation index at: https://aisa.one/docs/llms.txt
Use this file to discover all available pages before exploring further.
View on GitHub ->
One gateway for many LLMs. Route agent requests across OpenAI-compatible models through AIsa with a single API key.
Install
aisa skills install llm-router
What can agents do with it?
Model matching Pick a model based on task type and constraints.
Provider coverage Route across GPT, Claude, Gemini, Qwen, DeepSeek, Grok, and more.
Cost-aware routing Choose cheaper models when quality needs allow it.
Fallback planning Suggest alternates when a model is unavailable.
🔥 What Can You Do?
Multi-Model Chat
"Chat with GPT-4 for reasoning, switch to Claude for creative writing"
Model Comparison
"Compare responses from GPT-4, Claude, and Gemini for the same question"
Vision Analysis
"Analyze this image with GPT-4o - what objects are in it?"
Cost Optimization
"Route simple queries to fast/cheap models, complex queries to GPT-4"
Fallback Strategy
"If GPT-4 fails, automatically try Claude, then Gemini"
Why LLM Router?
Feature LLM Router Direct APIs API Keys 1 10+ SDK Compatibility OpenAI SDK Multiple SDKs Billing Unified Per-provider Model Switching Change string Code rewrite Fallback Routing Built-in DIY Cost Tracking Unified Fragmented
Supported Model Families
Family Developer Example Models GPT OpenAI gpt-4.1, gpt-4o, gpt-4o-mini, o1, o1-mini, o3-mini Claude Anthropic claude-3-5-sonnet, claude-3-opus, claude-3-sonnet Gemini Google gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash Qwen Alibaba qwen-max, qwen-plus, qwen2.5-72b-instruct Deepseek Deepseek deepseek-chat, deepseek-coder, deepseek-v3, deepseek-r1 Grok xAI grok-2, grok-beta
Note : Model availability may vary. Check console.aisa.one/pricing for the full list of currently available models and pricing.
Quick Start
export AISA_API_KEY = "your-key"
API Endpoints
OpenAI-Compatible Chat Completions
POST https://api.aisa.one/v1/chat/completions
Request
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 1000
}'
Parameters
Parameter Type Required Description modelstring Yes Model identifier (e.g., gpt-4.1, claude-3-sonnet) messagesarray Yes Conversation messages temperaturenumber No Randomness (0-2, default: 1) max_tokensinteger No Maximum response tokens streamboolean No Enable streaming (default: false) top_pnumber No Nucleus sampling (0-1) frequency_penaltynumber No Frequency penalty (-2 to 2) presence_penaltynumber No Presence penalty (-2 to 2) stopstring/array No Stop sequences
{
"role" : "user|assistant|system" ,
"content" : "message text or array for multimodal"
}
Response
{
"id" : "chatcmpl-xxx" ,
"object" : "chat.completion" ,
"created" : 1234567890 ,
"model" : "gpt-4.1" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Quantum computing uses..."
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 50 ,
"completion_tokens" : 200 ,
"total_tokens" : 250 ,
"cost" : 0.0025
}
}
Streaming Response
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-sonnet",
"messages": [{"role": "user", "content": "Write a poem about AI."}],
"stream": true
}'
Streaming returns Server-Sent Events (SSE):
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":"In"}}]}
data: {"id":"chatcmpl-xxx","choices":[{"delta":{"content":" circuits"}}]}
...
data: [DONE]
Vision / Image Analysis
Analyze images by passing image URLs or base64 data:
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}
]
}'
Function Calling
Enable tools/functions for structured outputs:
curl -X POST "https://api.aisa.one/v1/chat/completions" \
-H "Authorization: Bearer $AISA_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "What is the weather in Tokyo?"}],
"functions": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
],
"function_call": "auto"
}'
For Gemini models, you can also use the native format:
POST https://api.aisa.one/v1/models/{model}:generateContent
curl -X POST "https://api.aisa.one/v1/models/gemini-2.0-flash:generateContent" \
-H "Authorization: Bearer $AISA_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"contents": [
{
"role": "user",
"parts": [{"text": "Explain machine learning."}]
}
],
"generationConfig": {
"temperature": 0.7,
"maxOutputTokens": 1000
}
}'
Python Client
Installation
No installation required - uses standard library only.
CLI Usage
# Basic completion
python3 scripts/llm_router_client.py chat --model gpt-4.1 --message "Hello, world!"
# With system prompt
python3 scripts/llm_router_client.py chat --model claude-3-sonnet --system "You are a poet" --message "Write about the moon"
# Streaming
python3 scripts/llm_router_client.py chat --model gpt-4o --message "Tell me a story" --stream
# Multi-turn conversation
python3 scripts/llm_router_client.py chat --model qwen-max --messages '[{"role":"user","content":"Hi"},{"role":"assistant","content":"Hello!"},{"role":"user","content":"How are you?"}]'
# Vision analysis
python3 scripts/llm_router_client.py vision --model gpt-4o --image "https://example.com/image.jpg" --prompt "Describe this image"
# List supported models
python3 scripts/llm_router_client.py models
# Compare models
python3 scripts/llm_router_client.py compare --models "gpt-4.1,claude-3-sonnet,gemini-2.0-flash" --message "What is 2+2?"
Python SDK Usage
from llm_router_client import LLMRouterClient
client = LLMRouterClient() # Uses AISA_API_KEY env var
# Simple chat
response = client.chat(
model = "gpt-4.1" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response[ "choices" ][ 0 ][ "message" ][ "content" ])
# With options
response = client.chat(
model = "claude-3-sonnet" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Explain relativity." }
],
temperature = 0.7 ,
max_tokens = 500
)
# Streaming
for chunk in client.chat_stream(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Write a story." }]
):
print (chunk, end = "" , flush = True )
# Vision
response = client.vision(
model = "gpt-4o" ,
image_url = "https://example.com/image.jpg" ,
prompt = "What's in this image?"
)
# Compare models
results = client.compare_models(
models = [ "gpt-4.1" , "claude-3-sonnet" , "gemini-2.0-flash" ],
message = "Explain quantum computing"
)
for model, result in results.items():
print ( f " { model } : { result[ 'response' ][: 100 ] } ..." )
Use Cases
1. Cost-Optimized Routing
Use cheaper models for simple tasks:
def smart_route ( message : str ) -> str :
# Simple queries -> fast/cheap model
if len (message) < 50 :
model = "gpt-3.5-turbo"
# Complex reasoning -> powerful model
else :
model = "gpt-4.1"
return client.chat( model = model, messages = [{ "role" : "user" , "content" : message}])
2. Fallback Strategy
Automatic fallback on failure:
def chat_with_fallback ( message : str ) -> str :
models = [ "gpt-4.1" , "claude-3-sonnet" , "gemini-2.0-flash" ]
for model in models:
try :
return client.chat( model = model, messages = [{ "role" : "user" , "content" : message}])
except Exception :
continue
raise Exception ( "All models failed" )
3. Model A/B Testing
Compare model outputs:
results = client.compare_models(
models = [ "gpt-4.1" , "claude-3-opus" ],
message = "Analyze this quarterly report..."
)
# Log for analysis
for model, result in results.items():
log_response( model = model, latency = result[ "latency" ], cost = result[ "cost" ])
4. Specialized Model Selection
Choose the best model for each task:
MODEL_MAP = {
"code" : "deepseek-coder" ,
"creative" : "claude-3-opus" ,
"fast" : "gpt-3.5-turbo" ,
"vision" : "gpt-4o" ,
"chinese" : "qwen-max" ,
"reasoning" : "gpt-4.1"
}
def route_by_task ( task_type : str , message : str ) -> str :
model = MODEL_MAP .get(task_type, "gpt-4.1" )
return client.chat( model = model, messages = [{ "role" : "user" , "content" : message}])
Error Handling
Errors return JSON with error field:
{
"error" : {
"code" : "model_not_found" ,
"message" : "Model 'xyz' is not available"
}
}
Common error codes:
401 - Invalid or missing API key
402 - Insufficient credits
404 - Model not found
429 - Rate limit exceeded
500 - Server error
Best Practices
Use streaming for long responses to improve UX
Set max_tokens to control costs
Implement fallback for production reliability
Cache responses for repeated queries
Monitor usage via response metadata
Use appropriate models - don’t use GPT-4 for simple tasks
OpenAI SDK Compatibility
Just change the base URL and key:
import os
from openai import OpenAI
client = OpenAI(
api_key = os.environ[ "AISA_API_KEY" ],
base_url = "https://api.aisa.one/v1"
)
response = client.chat.completions.create(
model = "gpt-4.1" ,
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response.choices[ 0 ].message.content)
Pricing
Token-based pricing varies by model. Check console.aisa.one/pricing for current rates.
Model Family Approximate Cost GPT-4.1 / GPT-4o ~$0.01 / 1K tokens Claude-3-Sonnet ~$0.01 / 1K tokens Gemini-2.0-Flash ~$0.001 / 1K tokens Qwen-Max ~$0.005 / 1K tokens DeepSeek-V3 ~$0.002 / 1K tokens
Every response includes usage.cost and usage.credits_remaining.
Get Started
Sign up at aisa.one
Get your API key from the dashboard
Add credits (pay-as-you-go)
Set environment variable: export AISA_API_KEY="your-key"
Full API Reference
See API Reference for complete endpoint documentation.
Get started
Sign up at aisa.one (new accounts start with $2 free credit).
Generate an API key from the console.
Set your key and install the skill:
export AISA_API_KEY = "your-key"
aisa skills install llm-router
Start a new agent session so the runtime loads the updated skill instructions.
Model Catalog Browse supported model IDs and families.
Compare models Compare models before routing production traffic.
AIsa CN-LLM Route Chinese-language model routing.