多目标模型故障转移
本页通过示例说明如何配置多目标模型,让一个模型别名在多个上游目标之间执行重试和故障转移。
本教程会创建一个从主模型故障转移到备用模型的多目标模型。应 用只调用一个模型别名,由 AISIX 为每次请求选择上游模型。
多目标模型名为 chat-prod。它指向两个目标模型,每个目标模型使用自己的服务提供方密钥。第一次请求成功后,你会把主服务提供方密钥指向不可访问的主机,并确认 AISIX 会用备用目标处理下一次请求。
准备工作
请先准备以下内容:
- 已完成自托管快速开始。
- 安装 jq,用于在可运行配置中捕获资源 ID。
- 主上游和备用上游的 OpenAI API Key。因为故障转移检查会把主服务提供方密钥指向不可访问主机,所以两个上游可以使用同一个 OpenAI API Key。
设置变量
导出命令中要使用的变量:
export AISIX_ADMIN_KEY="admin-local-only-change-me"
export PRIMARY_OPENAI_API_KEY="YOUR_PRIMARY_PROVIDER_KEY"
export SECONDARY_OPENAI_API_KEY="YOUR_SECONDARY_PROVIDER_KEY"
export CALLER_KEY="sk-failover-demo"
如果两个上游使用同一个 OpenAI 账号,请将 PRIMARY_OPENAI_API_KEY 和 SECONDARY_OPENAI_API_KEY 设置为同一个值。
创建 AISIX 为调用方 API Key 保存的 SHA-256 哈希:
CALLER_KEY_HASH=$(printf '%s' "${CALLER_KEY}" | shasum -a 256 | awk '{print $1}')
配置多目标模型
要构建 chat-prod,请按顺序创建服务提供方密钥、目标模型、多目标模型和调用方 API Key。
创建服务提供方密钥
创建主服务提供方密钥:
PRIMARY_PK_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/provider_keys" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "openai-primary",
"provider": "openai",
"adapter": "openai",
"secret": "'"${PRIMARY_OPENAI_API_KEY}"'",
"api_base": "https://api.openai.com/v1"
}' | jq -r .id)
创建备用服务提供方密钥:
SECONDARY_PK_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/provider_keys" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "openai-secondary",
"provider": "openai",
"adapter": "openai",
"secret": "'"${SECONDARY_OPENAI_API_KEY}"'",
"api_base": "https://api.openai.com/v1"
}' | jq -r .id)
确认两个 ID 都已捕获:
printf '主服务提供方密钥:%s\n备用服务提供方密钥:%s\n' \
"${PRIMARY_PK_ID}" "${SECONDARY_PK_ID}"
你应该会看到两个非空 ID:
主服务提供方密钥:7fd2d8ce-f79d-49cc-b742-d32fda7b7d5a
备用服务提供方密钥:04573e6c-6319-477e-b2a4-a67a1911c727
这些 ID 已保存在 PRIMARY_PK_ID 和 SECONDARY_PK_ID 中,供后续命令使用。
创建目标模型
创建主模型:
PRIMARY_MODEL_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "gpt-4o-primary",
"provider": "openai",
"model_name": "gpt-4o-mini",
"provider_key_id": "'"${PRIMARY_PK_ID}"'"
}' | jq -r .id)
创建备用模型:
SECONDARY_MODEL_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "gpt-4o-secondary",
"provider": "openai",
"model_name": "gpt-4o-mini",
"provider_key_id": "'"${SECONDARY_PK_ID}"'"
}' | jq -r .id)
创建多目标模型
创建名为 chat-prod 的多目标模型。代理会先尝试 gpt-4o-primary,在发生可重试失败时重试一次,然后故障转移到 gpt-4o-secondary。
CHAT_PROD_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/models" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "chat-prod",
"routing": {
"strategy": "failover",
"targets": [
{"model": "gpt-4o-primary"},
{"model": "gpt-4o-secondary"}
],
"retries": 1,
"max_fallbacks": 1
}
}' | jq -r .id)
创建调用方 API Key
创建可以调用该多目标模型的调用方 API Key:
APIKEY_ID=$(curl -sS -X POST "http://127.0.0.1:3001/admin/v1/apikeys" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"key_hash": "'"${CALLER_KEY_HASH}"'",
"allowed_models": ["chat-prod"]
}' | jq -r .id)
确认模型和调用方 API Key 的 ID 都已捕获:
printf '主模型:%s\n备用模型:%s\n多目标模型:%s\n调用方 API Key:%s\n' \
"${PRIMARY_MODEL_ID}" "${SECONDARY_MODEL_ID}" "${CHAT_PROD_ID}" "${APIKEY_ID}"
每个已创建资源都应该有一个 ID:
主模型:3b909841-c0a7-4ad8-8f1b-a9df8f10b581
备用模型:4f49a654-b03d-4f19-a40c-9a6dc3210a55
多目标模型:247d7dc4-e943-42f8-a841-6a3758e6d34d
调用方 API Key:7d3b710e-9f47-4ab8-90a3-028b5572f686
这些 ID 已保存在 PRIMARY_MODEL_ID、SECONDARY_MODEL_ID、CHAT_PROD_ID 和 APIKEY_ID 中,供后续验证和清理命令使用。
如果任何值为空或为 null,请先检查上一个命令输出中的 error_msg,再继续。
验证故障转移行为
先通过正常链路发送一次请求,然后让主上游不可访问,并确认 AISIX 会用备用模型处理下一次请求。
验证多目标模型
向 chat-prod 发送请求:
curl -sS -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${CALLER_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "chat-prod",
"messages": [
{"role": "user", "content": "Say hello."}
]
}'
成功请求会返回类似下面的 OpenAI 兼容 Chat Completions 响应:
{
"id": "chatcmpl-***",
"object": "chat.completion",
"created": **********,
"model": "chat-prod",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 9,
"total_tokens": 19
}
}
触发 故障转移
更新主服务提供方密钥,使其指向不可访问的主机:
curl -sS -X PUT "http://127.0.0.1:3001/admin/v1/provider_keys/${PRIMARY_PK_ID}" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}" \
-H "Content-Type: application/json" \
-d '{
"display_name": "openai-primary",
"provider": "openai",
"adapter": "openai",
"secret": "'"${PRIMARY_OPENAI_API_KEY}"'",
"api_base": "https://api.openai.invalid/v1"
}'
再次发送请求,并包含响应头:
curl -sSi -X POST "http://127.0.0.1:3000/v1/chat/completions" \
-H "Authorization: Bearer ${CALLER_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "chat-prod",
"messages": [
{"role": "user", "content": "Say hello."}
]
}'
成功的故障转移响应会返回 HTTP/1.1 200 OK,并在 x-aisix-served-by 中包含备用目标:
HTTP/1.1 200 OK
content-type: application/json
x-aisix-call-id: ***
x-aisix-served-by: gpt-4o-secondary
server: AISIX/0.1.0
{
"id": "chatcmpl-***",
"object": "chat.completion",
"created": **********,
"model": "chat-prod",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 9,
"total_tokens": 19
}
}
请求仍然成功,因为 AISIX 会重试主目标,然后把请求转发到备用目标。
验证运行时状态
检查目标模型的运行时状态:
curl -sS "http://127.0.0.1:3001/admin/v1/models/status" \
-H "Authorization: Bearer ${AISIX_ADMIN_KEY}"
成功的状态响应会包含目标模型和多目标模型:
[
{
"id": "3b909841-c0a7-4ad8-8f1b-a9df8f10b581",
"display_name": "gpt-4o-primary",
"kind": "direct",
"status": "cooldown",
"cooldown_until": {
"secs_since_epoch": **********,
"nanos_since_epoch": 0
},
"status_reason": "transport_error"
},
{
"id": "4f49a654-b03d-4f19-a40c-9a6dc3210a55",
"display_name": "gpt-4o-secondary",
"kind": "direct",
"status": "healthy"
},
{
"id": "247d7dc4-e943-42f8-a841-6a3758e6d34d",
"display_name": "chat-prod",
"kind": "routing",
"status": "not_applicable"
}
]