跳到主要内容

ai-rate-limiting

ai-rate-limiting 插件为发送到 LLM 服务的请求强制执行基于令牌(Token)的速率限制。它通过控制指定时间范围内消耗的 Token 数量来帮助管理 API 使用情况,确保公平的资源分配并防止服务过载。该插件通常与 ai-proxy-multi 插件一起使用。

示例

以下示例展示了如何在不同场景下配置 ai-rate-limiting

限制单个实例的速率

以下示例展示了如何使用 ai-proxy-multi 配置两个模型进行负载均衡,将 80% 的流量转发到一个实例,20% 转发到另一个实例。此外,使用 ai-rate-limiting 在接收 80% 流量的实例上配置基于 Token 的速率限制,当配置的配额耗尽时,额外的流量将被转发到另一个实例。

创建一个路由,并更新为你的 LLM 提供商、模型、API 密钥和端点:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"instances": [
{
"name": "deepseek-instance-1",
"provider": "deepseek",
"weight": 8,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
},
{
"name": "deepseek-instance-2",
"provider": "deepseek",
"weight": 2,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
]
},
"ai-rate-limiting": {
"policy": "local",
"instances": [
{
// Annotate 1
"name": "deepseek-instance-1",
// Annotate 2
"limit_strategy": "total_tokens",
// Annotate 3
"limit": 100,
// Annotate 4
"time_window": 30
}
]
}
}
}'

❶ 对 deepseek-instance-1 实例应用速率限制。

❷ 基于 total_tokens 应用速率限制。

❸ 配置 100 个 Token 的配额。

❹ 配置时间窗口为 30 秒。

发送一个 POST 请求到该路由,请求体中包含系统提示词和一个示例用户问题:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

你应该会收到类似以下的响应:

{
...
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1 + 1 equals 2. This is a fundamental arithmetic operation where adding one unit to another results in a total of two units."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

如果 deepseek-instance-1 实例的 100 个 Token 配额在 30 秒窗口内耗尽,额外的请求将全部转发到未受速率限制的 deepseek-instance-2

对所有实例应用相同的配额

以下示例展示了如何在 ai-rate-limiting 中对所有 LLM 上游实例应用相同的速率限制配额。

为了演示和更易于区分,你将配置一个 OpenAI 实例和一个 DeepSeek 实例作为上游 LLM 服务。

创建一个路由,并更新为你的 LLM 提供商、模型、API 密钥和端点:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"instances": [
{
"name": "openai-instance",
"provider": "openai",
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
},
{
"name": "deepseek-instance",
"provider": "deepseek",
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
]
},
"ai-rate-limiting": {
"policy": "local",
"instances": [
{
// Annotate 1
"limit": 100,
// Annotate 2
"time_window": 60,
// Annotate 3
"rejected_code": 429,
// Annotate 4
"limit_strategy": "total_tokens"
}
]
}
}
}'

❶ 为所有实例配置 100 个 Token 的速率限制配额。

❷ 配置时间窗口为 60 秒。

❸ 设置拒绝响应的 HTTP 状态码为 429。

❹ 基于 total_tokens 应用速率限制。

发送一个 POST 请求到该路由,请求体中包含系统提示词和一个示例用户问题:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Explain Newtons laws" }
]
}'

你应该会收到来自任一 LLM 实例的响应,类似于:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Sure! Sir Isaac Newton formulated three laws of motion that describe the motion of objects. These laws are widely used in physics and engineering for studying and understanding how things move. Here they are:\n\n1. Newton's First Law - Law of Inertia: An object at rest tends to stay at rest and an object in motion tends to stay in motion with the same speed and in the same direction unless acted upon by an unbalanced force. This is also known as the principle of inertia.\n\n2. Newton's Second Law of Motion - Force and Acceleration: The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. This is usually formulated as F=ma where F is the force applied, m is the mass of the object and a is the acceleration produced.\n\n3. Newton's Third Law - Action and Reaction: For every action, there is an equal and opposite reaction. This means that any force exerted on a body will create a force of equal magnitude but in the opposite direction on the object that exerted the first force.\n\nIn simple terms: \n1. If you slide a book on a table and let go, it will stop because of the friction (or force) between it and the table.\n2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 256,
"total_tokens": 279,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": null
}

由于 total_tokens 值超过了配置的配额 100,因此在 60 秒窗口内的下一个请求预计将被转发到另一个实例。

在相同的 60 秒窗口内,发送另一个 POST 请求到该路由:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Explain Newtons laws" }
]
}'

你应该会收到来自另一个 LLM 实例的响应,类似于:

{
...
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Sure! Newton's laws of motion are three fundamental principles that describe the relationship between the motion of an object and the forces acting on it. They were formulated by Sir Isaac Newton in the late 17th century and are foundational to classical mechanics. Here's an explanation of each law:\n\n---\n\n### **1. Newton's First Law (Law of Inertia)**\n- **Statement**: An object will remain at rest or in uniform motion in a straight line unless acted upon by an external force.\n- **What it means**: This law introduces the concept of **inertia**, which is the tendency of an object to resist changes in its state of motion. If no net force acts on an object, its velocity (speed and direction) will not change.\n- **Example**: A book lying on a table will stay at rest unless you push it. Similarly, a hockey puck sliding on ice will keep moving at a constant speed unless friction or another force slows it down.\n\n---\n\n### **2. Newton's Second Law (Law of Acceleration)**\n- **Statement**: The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. Mathematically, this is expressed as:\n \\[\n F = ma\n \\]\n"
},
"logprobs": null,
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 13,
"completion_tokens": 256,
"total_tokens": 269,
"prompt_tokens_details": {
"cached_tokens": 0
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 13
},
"system_fingerprint": "fp_3a5770e1b4_prod0225"
}

由于 total_tokens 值超过了配置的配额 100,因此在 60 秒窗口内的下一个请求预计将被拒绝。

在相同的 60 秒窗口内,发送第三个 POST 请求到该路由:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Explain Newtons laws" }
]
}'

你应该会收到一个 HTTP 429 Too Many Requests 响应,并观察到以下请求头:

X-AI-RateLimit-Limit-openai-instance: 100
X-AI-RateLimit-Remaining-openai-instance: 0
X-AI-RateLimit-Reset-openai-instance: 0
X-AI-RateLimit-Limit-deepseek-instance: 100
X-AI-RateLimit-Remaining-deepseek-instance: 0
X-AI-RateLimit-Reset-deepseek-instance: 0

配置实例优先级和速率限制

以下示例展示了如何配置两个具有不同优先级的模型,并对优先级较高的实例应用速率限制。在 fallback_strategy 设置为 ["rate_limiting"] 的情况下,一旦高优先级实例的速率限制配额耗尽,插件应继续将请求转发给低优先级实例。

创建一个路由,并更新为你的 LLM 提供商、模型、API 密钥和端点:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
// Annotate 1
"fallback_strategy": ["rate_limiting"],
"instances": [
{
"name": "openai-instance",
"provider": "openai",
// Annotate 2
"priority": 1,
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
},
{
"name": "deepseek-instance",
"provider": "deepseek",
// Annotate 3
"priority": 0,
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
]
},
"ai-rate-limiting": {
"policy": "local",
"instances": [
{
// Annotate 4
"name": "openai-instance",
// Annotate 5
"limit": 10,
// Annotate 6
"time_window": 60
}
],
// Annotate 7
"limit_strategy": "total_tokens"
}
}
}'

❶ 将 fallback_strategy 设置为 ["rate_limiting"]

❷ 在 openai-instance 实例上设置更高的优先级。

❸ 在 deepseek-instance 实例上设置较低的优先级。

❹ 对 openai-instance 实例应用速率限制。

❺ 配置 10 个 Token 的配额。

❻ 配置时间窗口为 60 秒。

❼ 基于 total_tokens 应用速率限制。

发送一个 POST 请求到该路由,请求体中包含系统提示词和一个示例用户问题:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

你应该会收到类似以下的响应:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 8,
"total_tokens": 31,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": null
}

由于 total_tokens 值超过了配置的 10 个 Token 配额,因此在 60 秒窗口内的下一个请求预计将被转发到另一个实例。

在相同的 60 秒窗口内,发送另一个 POST 请求到该路由:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Explain Newton law" }
]
}'

你应该看到类似以下的响应:

{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Certainly! Newton's laws of motion are three fundamental principles that describe the relationship between the motion of an object and the forces acting on it. They were formulated by Sir Isaac Newton in the late 17th century and are foundational to classical mechanics.\n\n---\n\n### **1. Newton's First Law (Law of Inertia):**\n- **Statement:** An object at rest will remain at rest, and an object in motion will continue moving at a constant velocity (in a straight line at a constant speed), unless acted upon by an external force.\n- **Key Idea:** This law introduces the concept of **inertia**, which is the tendency of an object to resist changes in its state of motion.\n- **Example:** If you slide a book across a table, it eventually stops because of the force of friction acting on it. Without friction, the book would keep moving indefinitely.\n\n---\n\n### **2. Newton's Second Law (Law of Acceleration):**\n- **Statement:** The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. Mathematically, this is expressed as:\n \\[\n F = ma\n \\]\n where:\n - \\( F \\) = net force applied (in Newtons),\n -"
},
...
}
],
...
}

按消费者负载均衡和速率限制

以下示例展示了如何配置两个模型进行负载均衡,并按消费者应用速率限制。

创建消费者 johndoe,并在 openai-instance 实例上设置 60 秒窗口内 10 个 Token 的速率限制配额:

curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"username": "johndoe",
"plugins": {
"ai-rate-limiting": {
"policy": "local",
"instances": [
{
"name": "openai-instance",
"limit": 10,
"time_window": 60
}
],
"rejected_code": 429,
"limit_strategy": "total_tokens"
}
}
}'

配置 johndoekey-auth 凭证:

curl "http://127.0.0.1:9180/apisix/admin/consumers/johndoe/credentials" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "cred-john-key-auth",
"plugins": {
"key-auth": {
"key": "john-key"
}
}
}'

创建另一个消费者 janedoe,并在 deepseek-instance 实例上设置 60 秒窗口内 10 个 Token 的速率限制配额:

curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"username": "johndoe",
"plugins": {
"ai-rate-limiting": {
"policy": "local",
"instances": [
{
"name": "deepseek-instance",
"limit": 10,
"time_window": 60
}
],
"rejected_code": 429,
"limit_strategy": "total_tokens"
}
}
}'

配置 janedoekey-auth 凭证:

curl "http://127.0.0.1:9180/apisix/admin/consumers/janedoe/credentials" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "cred-jane-key-auth",
"plugins": {
"key-auth": {
"key": "jane-key"
}
}
}'

创建一个路由,并更新为你的 LLM 提供商、模型、API 密钥和端点:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
// Annotate 1
"key-auth": {},
"ai-proxy-multi": {
"fallback_strategy": ["rate_limiting"],
"instances": [
{
// Annotate 1
"name": "openai-instance",
"provider": "openai",
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
},
{
// Annotate 2
"name": "deepseek-instance",
"provider": "deepseek",
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
]
}
}
}'

❶ 在路由上启用 key-auth

❷ 配置 openai 实例。

❸ 配置 deepseek 实例。

发送一个不带任何消费者密钥的 POST 请求到该路由:

curl -i "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

你应该会收到一个 HTTP/1.1 401 Unauthorized 响应。

使用 johndoe 的密钥发送一个 POST 请求到该路由:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-H 'apikey: john-key' \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

你应该会收到类似以下的响应:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 8,
"total_tokens": 31,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": null
}

由于 total_tokens 值超过了 johndoeopenai 实例的配置配额,因此 johndoe 在 60 秒窗口内的下一个请求预计将被转发到 deepseek 实例。

在相同的 60 秒窗口内,使用 johndoe 的密钥发送另一个 POST 请求到该路由:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-H 'apikey: john-key' \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Explain Newtons laws to me" }
]
}'

你应该看到类似以下的响应:

{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Certainly! Newton's laws of motion are three fundamental principles that describe the relationship between the motion of an object and the forces acting on it. They were formulated by Sir Isaac Newton in the late 17th century and are foundational to classical mechanics.\n\n---\n\n### **1. Newton's First Law (Law of Inertia):**\n- **Statement:** An object at rest will remain at rest, and an object in motion will continue moving at a constant velocity (in a straight line at a constant speed), unless acted upon by an external force.\n- **Key Idea:** This law introduces the concept of **inertia**, which is the tendency of an object to resist changes in its state of motion.\n- **Example:** If you slide a book across a table, it eventually stops because of the force of friction acting on it. Without friction, the book would keep moving indefinitely.\n\n---\n\n### **2. Newton's Second Law (Law of Acceleration):**\n- **Statement:** The acceleration of an object is directly proportional to the net force acting on it and inversely proportional to its mass. Mathematically, this is expressed as:\n \\[\n F = ma\n \\]\n where:\n - \\( F \\) = net force applied (in Newtons),\n -"
},
...
}
],
...
}

使用 janedoe 的密钥发送一个 POST 请求到该路由:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-H 'apikey: jane-key' \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

你应该会收到类似以下的响应:

{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The sum of 1 and 1 is 2. This is a basic arithmetic operation where you combine two units to get a total of two units."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 31,
"total_tokens": 45,
"prompt_tokens_details": {
"cached_tokens": 0
},
"prompt_cache_hit_tokens": 0,
"prompt_cache_miss_tokens": 14
},
"system_fingerprint": "fp_3a5770e1b4_prod0225"
}

由于 total_tokens 值超过了 janedoedeepseek 实例的配置配额,因此 janedoe 在 60 秒窗口内的下一个请求预计将被转发到 openai 实例。

在相同的 60 秒窗口内,使用 janedoe 的密钥发送另一个 POST 请求到该路由:

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-H 'apikey: jane-key' \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Explain Newtons laws to me" }
]
}'

你应该看到类似以下的响应:

{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Sure, here are Newton's three laws of motion:\n\n1) Newton's First Law, also known as the Law of Inertia, states that an object at rest will stay at rest, and an object in motion will stay in motion, unless acted on by an external force. In simple words, this law suggests that an object will keep doing whatever it is doing until something causes it to do otherwise. \n\n2) Newton's Second Law states that the force acting on an object is equal to the mass of that object times its acceleration (F=ma). This means that force is directly proportional to mass and acceleration. The heavier the object and the faster it accelerates, the greater the force.\n\n3) Newton's Third Law, also known as the law of action and reaction, states that for every action, there is an equal and opposite reaction. Essentially, any force exerted onto a body will create a force of equal magnitude but in the opposite direction on the object that exerted the first force.\n\nRemember, these laws become less accurate when considering speeds near the speed of light (where Einstein's theory of relativity becomes more appropriate) or objects very small or very large. However, for everyday situations, they provide a good model of how things move.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

这显示了 ai-proxy-multi 如何根据 ai-rate-limiting 中针对消费者的速率限制规则来负载均衡流量。

基于规则的速率限制

以下示例展示了如何配置插件以根据请求属性应用不同的速率限制规则(从 API7 企业版 3.8.17 开始可用)。在此示例中,速率限制基于代表调用者访问层级的 HTTP 头值应用。

请注意,所有规则都是按顺序应用的。如果配置的键不存在,相应的规则将被跳过。

提示

除了 HTTP 头,你还可以基于其他 内置变量 来实施更灵活和细粒度的速率限制策略。

创建一个带有 ai-rate-limiting 插件的路由,该插件根据请求头应用不同的速率限制,允许按订阅 (X-Subscription-ID) 限制请求,并对试用用户 (X-Trial-ID) 实施更严格的限制:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"fallback_strategy": ["rate_limiting"],
"instances": [
{
"name": "openai-instance",
"provider": "openai",
"priority": 1,
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
},
{
"name": "deepseek-instance",
"provider": "deepseek",
"priority": 0,
"weight": 0,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
]
},
"ai-rate-limiting": {
"policy": "local",
"rejected_code": 429,
"rules": [
{
// Annotate 1
"key": "${http_x_subscription_id}",
// Annotate 2
"count": "${http_x_custom_count ?? 500}",
"time_window": 60
},
{
// Annotate 3
"key": "${http_x_trial_id}",
"count": 50,
"time_window": 60
}
]
}
},
"upstream": {
"type": "roundrobin",
"nodes": {
"httpbin.org:80": 1
}
}
}'

❶ 使用 X-Subscription-ID 请求头的值作为速率限制键。

❷ 基于 X-Custom-Count 头动态设置请求限制。如果未提供该头,则应用 500 个 Token 的默认计数。

❸ 使用 X-Trial-ID 请求头的值作为速率限制键。

要验证速率限制,请使用相同的订阅 ID 向该路由发送多个以下请求:

curl "http://127.0.0.1:9080/anything" -i -X POST \
-H "X-Subscription-ID: sub-123456789" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

这些请求应匹配具有 500 个默认 Token 计数的第一个规则。你应该看到配额内的请求返回 HTTP/1.1 200 OK,而超过配额的请求返回 HTTP/1.1 429 Too Many Requests

HTTP/1.1 200 OK
...
X-AI-1-RateLimit-Limit: 500
X-AI-1-RateLimit-Remaining: 499
X-AI-1-RateLimit-Reset: 60

HTTP/1.1 200 OK
...
X-AI-1-RateLimit-Limit: 500
X-AI-1-RateLimit-Remaining: 344
X-AI-1-RateLimit-Reset: 57.989000082016

HTTP/1.1 429 Too Many Requests
...
X-AI-1-RateLimit-Limit: 500
X-AI-1-RateLimit-Remaining: 0
X-AI-1-RateLimit-Reset: 5.871000051498

等待时间窗口重置。使用相同的订阅 ID 发送多个以下请求到该路由,并将 X-Custom-Count 头设置为 10:

curl "http://127.0.0.1:9080/anything" -i -X POST \
-H "X-Subscription-ID: sub-123456789" \
-H "X-Custom-Count: 10" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

这些请求应匹配具有 10 个自定义 Token 计数的第一个规则。你应该看到配额内的请求返回 HTTP/1.1 200 OK,而超过配额的请求返回 HTTP/1.1 429 Too Many Requests

HTTP/1.1 200 OK
...
X-AI-1-RateLimit-Limit: 10
X-AI-1-RateLimit-Remaining: 9
X-AI-1-RateLimit-Reset: 60

HTTP/1.1 429 Too Many Requests
...
X-AI-1-RateLimit-Limit: 10
X-AI-1-RateLimit-Remaining: 0
X-AI-1-RateLimit-Reset: 40.422000169754

最后,发送多个不带任何头的请求到该路由:

curl "http://127.0.0.1:9080/anything" -i -X POST \
-H "X-Trial-ID: trial-123456789" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

这些请求应匹配具有 50 个 Token 计数的第二个规则。你应该看到配额内的请求返回 HTTP/1.1 200 OK,而超过配额的请求返回 HTTP/1.1 429 Too Many Requests

HTTP/1.1 200 OK
...
X-AI-2-RateLimit-Limit: 50
X-AI-2-RateLimit-Remaining: 49
X-AI-2-RateLimit-Reset: 60

HTTP/1.1 429 Too Many Requests
...
X-AI-2-RateLimit-Limit: 50
X-AI-2-RateLimit-Remaining: 0
X-AI-2-RateLimit-Reset: 44

使用 Redis 服务器在网关之间共享配额

以下示例展示了如何在多个网关实例之间配置分布式速率限制。这在需要集群范围内速率限制一致性的生产环境中特别有用。

此示例适用于 API7 企业版 3.8.19 及更高版本。它不适用于 APISIX,因为尚不支持 policy 功能。

先决条件

在配置基于 Redis 的速率限制之前,请启动一个 Redis 实例。

docker run -d --name redis-standalone \
-p 6379:6379 \
-e REDIS_PASSWORD=p@ssw0rd \
redis:7-alpine redis-server --requirepass p@ssw0rd

然后验证 Redis 连接。

docker exec -it redis-standalone redis-cli -a p@ssw0rd ping

你应该收到响应 PONG,表示连接成功。

创建路由并配置速率限制

在网关组中创建一个具有以下配置的路由:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-redis-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"instances": [
{
"name": "deepseek-instance",
"provider": "deepseek",
"weight": 8,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
},
{
"name": "openai-instance",
"override": {
"endpoint": "https://openrouter.ai/api/v1/chat/completions"
},
"provider": "openai-compatible",
"weight": 2,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
]
},
"ai-rate-limiting": {
"instances": [
{
"name": "deepseek-instance",
"limit_strategy": "total_tokens",
"limit": 100,
"time_window": 30
},
{
"name": "openai-instance",
"limit_strategy": "total_tokens",
"limit": 50,
"time_window": 30
}
],
// Annotate 1
"policy": "redis",
// Annotate 2
"redis_host": "127.0.0.1",
// Annotate 3
"redis_port": 6379,
// Annotate 4
"redis_password": "p@ssw0rd",
// Annotate 5
"allow_degradation": false,
"rejected_code": 429
}
}
}'

policy: 设置为 redis 以使用 Redis 实例进行速率限制。

redis_host: 设置为 Redis 实例 IP 地址。

redis_port: 设置为 Redis 实例监听端口。

redis_password: 设置为 Redis 实例的密码(如果有)。

allow_degradation: 设置为 false,如果 Redis 不可用则拒绝请求。

验证

发送一个 POST 请求到该路由,请求体中包含系统提示词和一个示例用户问题。

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

你应该看到类似以下的响应:

{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "In mathematics, under the usual rules of arithmetic, **1 + 1 = 2**.\n\nThis follows from the definition of natural numbers and addition in systems like Peano arithmetic, where:\n\n- 1 is the successor of 0.\n- 2 is the successor of 1.\n- Addition is defined recursively so that 1 + 1 = S(0) + S(0) = S(S(0)) = 2.\n\nIn different contexts, the answer might vary (e.g., in Boolean algebra, 1 + 1 = 1 for logical OR; in modular arithmetic mod 2, 1 + 1 = 0), but in standard arithmetic, the answer is **2**."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

生成 3 个请求以消耗配额:

for i in {1..3}; do
curl -i "http://127.0.0.1:9080/anything" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}' &
sleep 1
done
wait

你应该收到 HTTP/1.1 200 OK 响应。消耗完速率限制配额后,后续请求将被拒绝。当超过速率限制时,你应该收到 429 Too Many Requests 响应。

使用 Redis 集群在网关节点之间共享配额

以下示例展示了如何通过 Redis 集群使多个网关节点共享相同的速率限制配额。

此示例适用于 API7 企业版 3.8.19 及更高版本。它不适用于 APISIX,因为尚不支持 policy 功能。

先决条件

  1. 创建一个 Docker 网络:

    docker network create redis-cluster-network

    确保你的网关实例在与 Redis 集群相同的网络中运行。

  2. 启动 6 个 Redis 节点并等待它们启动:

    for port in $(seq 7000 7005); do
    docker run -d \
    --name redis-node-$port \
    --network redis-cluster-network \
    -p $port:$port \
    redis:7.2-alpine \
    redis-server \
    --port $port \
    --cluster-enabled yes \
    --cluster-config-file nodes.conf \
    --cluster-node-timeout 5000 \
    --appendonly yes \
    --requirepass redis-cluster-password \
    --masterauth redis-cluster-password
    done && sleep 5
  3. 创建集群:

    docker run --rm \
    --network redis-cluster-network \
    redis:7.2-alpine \
    sh -c "
    redis-cli \
    --cluster create \
    $(for port in $(seq 7000 7005); do echo -n \"redis-node-$port:$port \"; done) \
    --cluster-replicas 1 \
    --cluster-yes \
    -a redis-cluster-password
    "

    预期输出应类似于以下内容:

    ...,
    [OK] All nodes agree about slots configuration.
    >>> Check for open slots...
    >>> Check slots coverage...
    [OK] All 16384 slots covered.
  4. 验证集群节点:

    docker exec -it redis-node-7000 redis-cli -c -a redis-cluster-password -p 7000 cluster nodes

    预期输出应类似于以下内容:

    node-id-1 172.XX.0.2:7000@17000 myself,master - 0 0 1 connected 0-5460
    node-id-2 172.XX.0.3:7001@17001 master - 0 0 2 connected 5461-10922
    node-id-3 172.XX.0.4:7002@17002 master - 0 0 3 connected 10923-16383
    node-id-4 172.XX.0.5:7003@17003 slave node-id-1 0 0 1 connected
    node-id-5 172.XX.0.6:7004@17004 slave node-id-2 0 0 2 connected
    node-id-6 172.XX.0.7:7005@17005 slave node-id-3 0 0 3 connected
  5. 检查集群健康状况(可选):

    docker exec redis-node-7000 redis-cli -c -a redis-cluster-password -p 7000 cluster info

    你应该看到以下响应:

    cluster_state:ok
    cluster_slots_assigned:16384
    cluster_slots_ok:16384
    cluster_known_nodes:6
    cluster_size:3
    ...

创建路由并配置速率限制

在网关组中创建一个具有以下配置的路由:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-redis-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"instances": [
{
"name": "deepseek-instance",
"provider": "deepseek",
"weight": 8,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
},
{
"name": "openai-instance",
"override": {
"endpoint": "https://openrouter.ai/api/v1/chat/completions"
},
"provider": "openai-compatible",
"weight": 2,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
]
},
"ai-rate-limiting": {
"instances": [
{
"name": "deepseek-instance",
"limit_strategy": "total_tokens",
"limit": 200,
"time_window": 60
},
{
"name": "openai-instance",
"limit_strategy": "total_tokens",
"limit": 100,
"time_window": 60
}
],
// Annotate 1
"policy": "redis-cluster",
// Annotate 2
"redis_cluster_nodes": [
"172.XX.0.2:7000",
"172.XX.0.3:7001",
"172.XX.0.4:7002",
"172.XX.0.5:7003",
"172.XX.0.6:7004",
"172.XX.0.7:7005"
],
// Annotate 3
"redis_password": "redis-cluster-password",
// Annotate 4
"redis_cluster_name": "redis-cluster-1",
"redis_timeout": 1000,
"redis_connect_timeout": 1000,
"allow_degradation": false,
"rejected_code": 429
}
}
}'

policy: 设置为 redis-cluster 以使用 Redis 集群进行速率限制。

redis_cluster_nodes: 设置为 Redis 集群中的 Redis 节点地址。

redis_password: 设置为 Redis 集群的密码(如果有)。

redis_cluster_name: 设置为 Redis 集群名称。

验证

发送一个 POST 请求到该路由,请求体中包含系统提示词和一个示例用户问题。

curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'

你应该看到类似以下的响应:

{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "In mathematics, under the usual rules of arithmetic, **1 + 1 = 2**.\n\nThis follows from the definition of natural numbers and addition in systems like Peano arithmetic, where:\n\n- 1 is the successor of 0.\n- 2 is the successor of 1.\n- Addition is defined recursively so that 1 + 1 = S(0) + S(0) = S(S(0)) = 2.\n\nIn different contexts, the answer might vary (e.g., in Boolean algebra, 1 + 1 = 1 for logical OR; in modular arithmetic mod 2, 1 + 1 = 0), but in standard arithmetic, the answer is **2**."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}

生成 3 个请求以消耗配额:

for i in {1..3}; do
curl -i "http://127.0.0.1:9080/anything" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}' &
sleep 1
done
wait

你应该收到大部分请求的 HTTP/1.1 200 OK 响应,其余为 HTTP 429 Too Many Requests 响应。这验证了在不同网关节点中配置的路由共享相同的配额。

通过检查所有节点上的速率限制键,验证速率限制计数器是否存储在 Redis 集群中:

for port in {7000..7005}; do
echo "Checking node redis-node-$port:"
docker exec redis-node-$port redis-cli -c -a redis-cluster-password -p $port keys "plugin-ai-rate-limiting*" 2>/dev/null | grep -v "^$" || echo "No related keys found"
done

你应该看到类似以下的输出:

Checking node redis-node-7000:
No related keys found
Checking node redis-node-7001:
No related keys found
Checking node redis-node-7002:
plugin-ai-rate-limitingroute&service<route-id>&<service-id>:<timestamp>:<client-ip>:<rate-limit-instance>
Checking node redis-node-7003:
plugin-ai-rate-limitingroute&service<route-id>&<service-id>:<timestamp>:<client-ip>:<rate-limit-instance>
Checking node redis-node-7004:
No related keys found
Checking node redis-node-7005:
No related keys found

使用 Redis Sentinel 在网关节点之间共享配额

此示例适用于 API7 企业版 3.9.2 及更高版本。它不适用于 APISIX,因为尚不支持 policy 功能。

当你需要自动故障转移和高可用性但不需要数据分片时,请使用 Redis Sentinel。这种模式更易于管理,适用于大多数高可用性要求。

确保你的 Redis 实例运行在 Sentinel 模式 下。

先决条件

  1. 创建一个 Docker 网络:

    docker network create redis-sentinel-network

    确保你的网关实例在与 Redis Sentinel 集群相同的网络中运行。

  2. 启动一个 Redis 主节点:

    docker run -d --name redis-master --network redis-sentinel-network \
    -p 6379:6379 \
    redis:7.2-alpine \
    redis-server --requirepass StrongP@ss123 --appendonly yes
  3. 启动 Sentinel 副本节点:

    for i in 1 2; do
    PORT=$((6380 + i - 1))
    docker run -d --name redis-slave-$i --network redis-sentinel-network \
    -p $PORT:6379 \
    redis:7.2-alpine \
    redis-server --slaveof redis-master 6379 \
    --requirepass StrongP@ss123 \
    --masterauth StrongP@ss123 \
    --appendonly yes
    done
  4. 获取主节点 IP 地址以进行下一步:

    MASTER_IP=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' redis-master)
    echo "Redis master node IP: $MASTER_IP"
  5. 启动 Sentinel 集群并将 $MASTER_IP 替换为你的主节点 IP:

    for i in 1 2 3; do
    docker run -d --name redis-sentinel-$i --network redis-sentinel-network -p $((26378+i-1)):26379 \
    redis:7.2-alpine \
    sh -c "
    cat << 'EOF' > /sentinel.conf
    port 26379
    sentinel monitor mymaster $MASTER_IP 6379 2
    sentinel auth-pass mymaster StrongP@ss123
    requirepass admin-password
    sentinel down-after-milliseconds mymaster 5000
    sentinel failover-timeout mymaster 10000
    sentinel parallel-syncs mymaster 1
    protected-mode no
    EOF
    redis-sentinel /sentinel.conf
    "
    done
    echo "✅ Sentinel cluster started successfully."

    你可以看到以下响应:

    Starting redis-sentinel-1 (port:26379)...
    eb9efacb629d0cfdfaa48856f42ba8c67642baa79f1589df5b251c11d3ec6e1a
    Starting redis-sentinel-2 (port:26380)...
    7f23f4b6e63c9b6be4c5e1903a244f078d481952a1465a9650c743ea2ee4600f
    Starting redis-sentinel-3 (port:26381)...
    1df087502124e3903df7ae665ef597bf735669c5ce3f9d87696c4acd82526626
    ✅ Sentinel cluster started successfully.
  6. 确认 Sentinel 环境运行正常:

    echo "Waiting for Sentinel cluster establishment (10 seconds)..."
    sleep 10

    echo -e "\nVerifying Sentinel cluster status:"
    for i in 1 2 3; do
    echo "--- Sentinel $i status ---"
    if docker ps | grep -q "redis-sentinel-$i"; then
    echo "Container: ✅ Running"
    docker exec redis-sentinel-$i redis-cli -p 26379 SENTINEL master mymaster 2>&1 | grep -E "(flags|num-slaves|num-other-sentinels)"
    else
    echo "Container: ❌ Not running (run 'docker logs redis-sentinel-$i' to check)"
    fi
    echo ""
    done

    你可以看到以下响应:

    Verifying Sentinel cluster status:
    --- Sentinel 1 status ---
    Container: ✅ Running

    --- Sentinel 2 status ---
    Container: ✅ Running

    --- Sentinel 3 status ---
    Container: ✅ Running
  7. 获取 Sentinel IP 地址以进行插件配置:

    echo -e "Getting Sentinel container IP addresses:"
    for i in 1 2 3; do
    IP=$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' redis-sentinel-$i)
    echo " redis-sentinel-$i : $IP"
    done

    你可以看到以下响应:

    Getting Sentinel container IP addresses:
    redis-sentinel-1 : 172.22.0.4
    redis-sentinel-2 : 172.22.0.5
    redis-sentinel-3 : 172.22.0.6
  8. 进行详细的状态检查:

    echo "Checking detailed Sentinel cluster status..."
    for i in 1 2 3; do
    echo "--- Sentinel $i detailed info ---"
    docker exec redis-sentinel-$i redis-cli -p 26379 SENTINEL masters
    echo "-----------------------------------"
    done

创建路由并配置速率限制

在网关组中创建一个具有以下配置的路由:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${ADMIN_API_KEY}" \
-d '{
"id": "ai-rate-limiting-redis-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
"instances": [
{
"name": "deepseek-instance",
"provider": "deepseek",
"weight": 8,
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
},
{
"name": "openai-instance",
"override": {
"endpoint": "https://openrouter.ai/api/v1/chat/completions"
},
"provider": "openai-compatible",
"weight": 2,
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
]
},
"ai-rate-limiting": {
"instances": [
{
"name": "deepseek-instance",
"limit_strategy": "total_tokens",
"limit": 200,
"time_window": 60
},
{
"name": "openai-instance",
"limit_strategy": "total_tokens",
"limit": 100,
"time_window": 60
}
],
// Annotate 1
"policy": "redis-sentinel",
// Annotate 2
"redis_sentinels": [
{ "host": "172.22.0.4", "port": 26379 },
{ "host": "172.22.0.5", "port": 26379 },
{ "host": "172.22.0.6", "port": 26379 }
],
// Annotate 3
"redis_master_name": "mymaster",
// Annotate 4
"sentinel_password": "admin-password",
// Annotate 5
"redis_password": "StrongP@ss123",
"redis_timeout": 1000,
"redis_connect_timeout": 1000,
"allow_degradation": false,
"rejected_code": 429
}
}
}'

policy: 设置为 redis-sentinel 以使用 Redis Sentinel 进行速率限制。

redis_sentinels: 设置为 Redis Sentinel 节点地址。

redis_master_name: 设置为 Sentinel 监控的 Redis 主节点名称。

sentinel_password: 设置为 Sentinel 实例的密码(如果有)。

redis_password: 设置为 Redis 主节点的密码(如果有)。

验证

验证步骤与前面的示例类似。发送请求以消耗配额,并观察速率限制行为。