ai-aliyun-content-moderation

ai-aliyun-content-moderation 插件支持集成阿里云内容安全增强版，在代理 LLM 请求时检查请求体的风险等级（如涉黄、涉政、辱骂、暴力等），如果评估结果超过配置的阈值，则拒绝该请求。

请确保在插件中正确配置了 access_key_secret。如果配置错误，所有请求将绕过插件直接转发到 LLM 上游，并且你将在网关的错误日志中看到 Specified signature is not matched with our calculation （指定的签名与我们的计算不匹配）错误。

ai-aliyun-content-moderation 插件应与 ai-proxy 或 ai-proxy-multi 插件配合使用，用于代理 LLM 请求。

示例

以下示例将使用 OpenAI 作为上游服务提供商。

在开始之前，请创建一个 OpenAI 账号并获取 API Key。如果你使用其他 LLM 提供商，请参考该提供商的文档获取 API Key。

此外，请创建一个阿里云账号，开通内容安全增强版服务，并获取 endpoint、region ID、access key ID 和 access key secret。

你可以选择将这些信息保存到环境变量中：

# 替换为你的数据
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
export ALIYUN_ENDPOINT=<YOUR_ALIYUN_ENDPOINT>
export ALIYUN_REGION_ID=<YOUR_ALIYUN_REGION_ID>
export ALIYUN_ACCESS_KEY_ID=<YOUR_ALIYUN_ACCESS_KEY_ID>
export ALIYUN_ACCESS_KEY_SECRET=<YOUR_ALIYUN_ACCESS_KEY_SECRET>

审核请求内容毒性

使用 ai-proxy 插件创建一个通往 LLM 聊天完成端点的路由，并在 ai-aliyun-content-moderation 插件中配置集成详情以及拒绝代码和消息：

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "ai-aliyun-content-moderation-route",
    "uri": "/anything",
    "plugins": {
      "ai-aliyun-content-moderation": {
        "endpoint": "'"$ALIYUN_ENDPOINT"'",
        "region_id": "'"$ALIYUN_REGION_ID"'",
        "access_key_id": "'"$ALIYUN_ACCESS_KEY_ID"'",
        "access_key_secret": "'"$ALIYUN_ACCESS_KEY_SECRET"'",
        "deny_code": 400,
        "deny_message": "Request contains forbidden content, such as hate speech or violence."
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        }
      }
    }
  }'

❶ 配置拒绝 HTTP 状态码。

❷ 配置拒绝消息。

向该路由发送一个 POST 请求，请求体中包含系统提示词和一个带有脏话的用户问题：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Stupid, what is 1+1?" }
    ]
  }'

你应该收到 HTTP/1.1 400 Bad Request 响应，并看到以下消息：

{
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 124,
    "prompt_tokens": 31,
    "total_tokens": 155
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Request contains forbidden content, such as hate speech or violence."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "model": "gpt-4",
  "id": "c9466bbf-e010-469d-949a-a10f25525964"
}

向该路由发送另一个请求，请求体中包含一个正常的问题：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

你应该收到 HTTP/1.1 200 OK 响应，并看到模型输出：

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

调整风险等级阈值

以下示例演示了如何调整风险等级阈值，该阈值控制请求/响应是否可以通过。

使用 ai-proxy 插件创建一个通往 LLM 聊天完成端点的路由，并将 ai-aliyun-content-moderation 中的 risk_level_bar 配置为 high：

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "ai-aliyun-content-moderation-route",
    "uri": "/anything",
    "plugins": {
      "ai-aliyun-content-moderation": {
        "endpoint": "'"$ALIYUN_ENDPOINT"'",
        "region_id": "'"$ALIYUN_REGION_ID"'",
        "access_key_id": "'"$ALIYUN_ACCESS_KEY_ID"'",
        "access_key_secret": "'"$ALIYUN_ACCESS_KEY_SECRET"'",
        "deny_code": 400,
        "deny_message": "Request contains forbidden content, such as hate speech or violence.",
        "risk_level_bar": "high"
      },
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "model": "gpt-4"
      }
    }
  }'

向该路由发送一个 POST 请求，请求体中包含系统提示词和一个带有脏话的用户问题：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Stupid, what is 1+1?" }
    ]
  }'

你应该收到 HTTP/1.1 400 Bad Request 响应，并看到以下消息：

{
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 124,
    "prompt_tokens": 31,
    "total_tokens": 155
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Request contains forbidden content, such as hate speech or violence."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "model": "gpt-4",
  "id": "c9466bbf-e010-469d-949a-a10f25525964"
}

将插件中的 risk_level_bar 更新为 max：

curl "http://127.0.0.1:9180/apisix/admin/routes/ai-aliyun-content-moderation-route" -X PATCH \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "plugins": {
      "ai-aliyun-content-moderation": {
        "risk_level_bar": "max"
      }
    }
  }'

向该路由发送相同的请求：

curl -i "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "Stupid, what is 1+1?" }
    ]
  }'

你应该收到 HTTP/1.1 200 OK 响应，并看到模型输出：

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

这是因为单词 “stupid” 的风险等级为 high，低于配置的阈值 max。要查看阿里云的审核结果，你可以将网关的日志级别更新为 debug，如下所示：

conf/config.yaml
nginx_config:
  error_log_level: debug

重新加载网关以使配置更改生效。

例如，对于上面的请求，你应该看到类似以下的调试日志条目：

{
  "RequestId": "29F7AD19-074B-54AC-B240-B297AD96883F",
  "Message": "OK",
  "Data": {
    ...,
    "RiskLevel": "high",
    "Result": [
      {
        "RiskWords": "are&you&stupid",
        ...
      }
    ]
  },
  "Code": 200
}

示例​

审核请求内容毒性​

调整风险等级阈值​

示例

审核请求内容毒性

调整风险等级阈值