Prometheus

prometheus 插件提供了将 APISIX 与 Prometheus 集成的能力。

启用插件后，APISIX 将开始收集相关指标，例如 API 请求和延迟，并以基于文本的展示格式将它们导出到 Prometheus。然后，你可以在 Prometheus 中创建事件监控和警报，以监控 API 网关和 API 的健康状况。

指标

Prometheus 中有不同类型的指标。要了解它们的区别，请参阅指标类型。

默认情况下，prometheus 插件会导出以下指标。有关示例，请参阅获取 APISIX 指标。请注意，如果没有数据，某些指标（如 apisix_batch_process_entries）可能不会立即显示。

名称	类型	描述
apisix_bandwidth	counter	流经 APISIX 的总流量（以字节为单位）。
apisix_etcd_modify_indexes	gauge	APISIX 键对 etcd 的更改次数。
apisix_batch_process_entries	gauge	批量发送数据时批次中的剩余条目数，例如使用 `http logger` 和其他日志插件时。
apisix_etcd_reachable	gauge	APISIX 是否可以连接到 etcd。值 `1` 表示可达，`0` 表示不可达。
apisix_http_status	counter	上游服务返回的 HTTP 状态码。
apisix_http_requests_total	gauge	来自客户端的 HTTP 请求数。
apisix_nginx_http_current_connections	gauge	当前与客户端的连接数。
apisix_nginx_metric_errors_total	counter	`nginx-lua-prometheus` 错误总数。
apisix_http_latency	histogram	HTTP 请求延迟（以毫秒为单位）。
apisix_node_info	gauge	有关 APISIX 节点的信息，例如主机名和 APISIX 版本。
apisix_shared_dict_capacity_bytes	gauge	NGINX 共享字典的总容量。
apisix_shared_dict_free_space_bytes	gauge	NGINX 共享字典中的剩余空间。
apisix_upstream_status	gauge	上游节点的健康检查状态，如果在上游配置了健康检查则可用。值 `1` 表示健康，`0` 表示不健康。
apisix_stream_connection_total	counter	每个流路由处理的连接总数。
apisix_llm_prompt_tokens	counter	仅在企业版（自 3.9.7 起）可用。提示 token 的数量。仅对 AI 请求类型导出。
apisix_llm_completion_tokens	counter	仅在企业版（自 3.9.7 起）可用。完成 token 的数量。仅对 AI 请求类型导出。
apisix_llm_latency	histogram	仅在企业版（自 3.9.7 起）可用。从请求发送到从 LLM 服务接收到第一个 token 的持续时间（以毫秒为单位）。仅对 AI 请求类型导出。
apisix_llm_active_connections	gauge	仅在企业版（自 3.9.7 起）可用。与 LLM 服务的活动连接数。仅对 AI 请求类型导出。

备注

LLM 指标（apisix_llm_prompt_tokens、apisix_llm_completion_tokens 和 apisix_llm_latency）仅在请求由 AI 插件（如 AI Proxy）处理时导出。未配置 AI 插件的路由不会产生这些指标。apisix_llm_active_connections 由 AI 插件直接管理，也仅在启用了 AI 的路由上才会出现。

要减少 LLM 指标上的高基数标签，可使用插件元数据中的 disabled_labels 有选择地禁用标签，例如 consumer 或 node。

名称	描述
code	上游节点返回的 HTTP 响应代码。
route	当 `prefer_name` 为 `false`（默认值）时，为 HTTP 状态源自的路由 ID；当 `prefer_name` 为 `true` 时，为路由名称。如果请求不匹配任何路由，则默认为空字符串。
route_id	仅在 Enterprise 中可用。无论 `prefer_name` 设置如何，HTTP 状态源自的路由 ID。
matched_uri	匹配请求的路由 URI。如果请求不匹配任何路由，则默认为空字符串。
matched_host	匹配请求的路由主机。如果请求不匹配任何路由，或者路由上未配置主机，则默认为空字符串。
service	当 `prefer_name` 为 `false`（默认值）时，为 HTTP 状态源自的服务 ID；当 `prefer_name` 为 `true` 时，为服务名称。如果匹配的路由不属于任何服务，则默认为路由上配置的主机值。
service_id	仅在 Enterprise 中可用。无论 `prefer_name` 设置如何，HTTP 状态源自的服务 ID。
consumer	与请求关联的消费者名称。如果请求没有关联消费者，则默认为空字符串。
node	上游节点的 IP 地址。
gateway_group_id	仅在 Enterprise 中可用。HTTP 状态源自的网关组 ID。
instance_id	仅在 Enterprise 中可用。HTTP 状态源自的网关实例 ID。
api_product_id	仅在 Enterprise 中可用。HTTP 状态源自的产品 ID。
request_type	仅在 Enterprise 中可用。HTTP 状态源自的请求类型。
request_llm_model	仅在企业版（自 3.9.7 起）可用。客户端请求中指定的 LLM 模型。
llm_model	仅在 Enterprise 中可用。HTTP 状态源自的 LLM 模型。
response_source	仅在企业版（自版本 3.9.10 起）可用。HTTP 响应的来源：`apisix`（由 APISIX 生成，如插件拒绝或路由未找到）、`nginx`（NGINX 代理错误，如连接被拒绝或上游超时）或 `upstream`（来自上游服务的真实响应）。APISIX 中暂不可用。

`apisix_bandwidth` 的标签

以下标签用于区分 apisix_bandwidth 指标。

名称	描述
type	流量类型，`egress`（出口）或 `ingress`（入口）。
route	当 `prefer_name` 为 `false`（默认值）时，为带宽对应的路由 ID；当 `prefer_name` 为 `true` 时，为路由名称。如果请求不匹配任何路由，则默认为空字符串。
route_id	仅在 Enterprise 中可用。无论 `prefer_name` 设置如何，带宽对应的路由 ID。
service	当 `prefer_name` 为 `false`（默认值）时，为带宽对应的服务 ID；当 `prefer_name` 为 `true` 时，为服务名称。如果匹配的路由不属于任何服务，则默认为路由上配置的主机值。
service_id	仅在 Enterprise 中可用。无论 `prefer_name` 设置如何，带宽对应的服务 ID。
consumer	与请求关联的消费者名称。如果请求没有关联消费者，则默认为空字符串。
node	上游节点的 IP 地址。
gateway_group_id	仅在 Enterprise 中可用。带宽对应的网关组 ID。
instance_id	仅在 Enterprise 中可用。带宽对应的网关实例 ID。
api_product_id	仅在 Enterprise 中可用。带宽对应的产品 ID。
request_type	仅在 Enterprise 中可用。带宽对应的请求类型。
request_llm_model	仅在企业版（自 3.9.7 起）可用。客户端请求中指定的 LLM 模型。
llm_model	仅在 Enterprise 中可用。带宽对应的 LLM 模型。

`apisix_http_latency` 的标签

以下标签用于区分 apisix_http_latency 指标。

名称	描述
type	延迟类型。有关详细信息，请参阅延迟类型。
route	当 `prefer_name` 为 `false`（默认值）时，为延迟对应的路由 ID；当 `prefer_name` 为 `true` 时，为路由名称。如果请求不匹配任何路由，则默认为空字符串。
route_id	仅在 Enterprise 中可用。无论 `prefer_name` 设置如何，延迟对应的路由 ID。
service	当 `prefer_name` 为 `false`（默认值）时，为延迟对应的服务 ID；当 `prefer_name` 为 `true` 时，为服务名称。如果匹配的路由不属于任何服务，则默认为路由上配置的主机值。
service_id	仅在 Enterprise 中可用。无论 `prefer_name` 设置如何，延迟对应的服务 ID。
consumer	与延迟关联的消费者名称。如果请求没有关联消费者，则默认为空字符串。
node	与延迟关联的上游节点的 IP 地址。
gateway_group_id	仅在 Enterprise 中可用。延迟对应的网关组 ID。
instance_id	仅在 Enterprise 中可用。延迟对应的网关实例 ID。
api_product_id	仅在 Enterprise 中可用。延迟对应的产品 ID。
request_type	仅在 Enterprise 中可用。延迟对应的请求类型。
request_llm_model	仅在企业版（自 3.9.7 起）可用。客户端请求中指定的 LLM 模型。
llm_model	仅在 Enterprise 中可用。延迟对应的 LLM 模型。

延迟类型

apisix_http_latency 可以用以下三种类型之一进行标记：

request 表示从客户端读取第一个字节到向客户端发送最后一个字节后的日志写入之间经过的时间。
upstream 表示等待上游服务响应所经过的时间。
apisix 表示 request 延迟与 upstream 延迟之间的差值。

换句话说，APISIX 延迟不仅仅归因于 Lua 处理。它应该理解如下：

APISIX 延迟
  = 下游请求时间 - 上游响应时间
  = 下游流量延迟 + NGINX 延迟

`apisix_upstream_status` 的标签

以下标签用于区分 apisix_upstream_status 指标。

名称	描述
name	配置了健康检查的上游对应的资源 ID，例如 `/apisix/routes/1` 和 `/apisix/upstreams/1`。
ip	上游节点的 IP 地址。
port	节点的端口号。

`apisix_llm_latency` 的标签

以下标签用于区分 apisix_llm_latency 指标。

名称	描述
route	当 `prefer_name` 为 `false`（默认值）时，为 HTTP 状态源自的路由 ID；当 `prefer_name` 为 `true` 时，为路由名称。如果请求不匹配任何路由，则默认为空字符串。
route_id	无论 `prefer_name` 设置如何，HTTP 状态源自的路由 ID。
service	当 `prefer_name` 为 `false`（默认值）时，为 HTTP 状态源自的服务 ID；当 `prefer_name` 为 `true` 时，为服务名称。如果匹配的路由不属于任何服务，则默认为路由上配置的主机值。
service_id	无论 `prefer_name` 设置如何，HTTP 状态源自的服务 ID。
consumer	与请求关联的消费者名称。如果请求没有关联消费者，则默认为空字符串。
node	上游节点的 IP 地址。
gateway_group_id	HTTP 状态源自的网关组 ID。
instance_id	HTTP 状态源自的网关实例 ID。
api_product_id	HTTP 状态源自的产品 ID。
request_type	HTTP 状态源自的请求类型。
request_llm_model	自 3.9.7 起可用。客户端请求中指定的 LLM 模型。
llm_model	HTTP 状态源自的 LLM 模型。

其他 LLM 指标的标签

以下标签用于区分 apisix_llm_prompt_tokens、apisix_llm_completion_tokens 和 apisix_llm_active_connections 指标。

名称	描述
route	当 `prefer_name` 为 `false`（默认值）时，为 HTTP 状态源自的路由 ID；当 `prefer_name` 为 `true` 时，为路由名称。如果请求不匹配任何路由，则默认为空字符串。
route_id	无论 `prefer_name` 设置如何，HTTP 状态源自的路由 ID。
matched_uri	匹配请求的路由 URI。如果请求不匹配任何路由，则默认为空字符串。
matched_host	匹配请求的路由主机。如果请求不匹配任何路由，或者路由上未配置主机，则默认为空字符串。
service	当 `prefer_name` 为 `false`（默认值）时，为 HTTP 状态源自的服务 ID；当 `prefer_name` 为 `true` 时，为服务名称。如果匹配的路由不属于任何服务，则默认为路由上配置的主机值。
service_id	无论 `prefer_name` 设置如何，HTTP 状态源自的服务 ID。
consumer	与请求关联的消费者名称。如果请求没有关联消费者，则默认为空字符串。
node	上游节点的 IP 地址。
gateway_group_id	HTTP 状态源自的网关组 ID。
instance_id	HTTP 状态源自的网关实例 ID。
api_product_id	HTTP 状态源自的产品 ID。
request_type	HTTP 状态源自的请求类型。
request_llm_model	自 3.9.7 起可用。客户端请求中指定的 LLM 模型。
llm_model	HTTP 状态源自的 LLM 模型。

示例

下面的示例展示了如何在不同场景下使用 prometheus 插件。

获取 APISIX 指标

以下示例展示了如何从 APISIX 获取指标。

默认的 Prometheus 指标端点和其他 Prometheus 相关配置可以在静态配置中找到。如果你想自定义这些配置，请参阅配置文件。

如果你在容器化环境中部署 APISIX，并希望从外部访问 Prometheus 指标端点，请按如下方式更新配置文件并重新加载 APISIX：

conf/config.yaml
plugin_attr:
  prometheus:
    export_addr:
      ip: 0.0.0.0 

向 APISIX Prometheus 指标端点发送请求：

curl "http://127.0.0.1:9091/apisix/prometheus/metrics"

你应该看到类似于以下的输出：

# HELP apisix_bandwidth Total bandwidth in bytes consumed per service in Apisix
# TYPE apisix_bandwidth counter
apisix_bandwidth{type="egress",route="",service="",consumer="",node=""} 8417
apisix_bandwidth{type="egress",route="1",service="",consumer="",node="127.0.0.1"} 1420
apisix_bandwidth{type="egress",route="2",service="",consumer="",node="127.0.0.1"} 1420
apisix_bandwidth{type="ingress",route="",service="",consumer="",node=""} 189
apisix_bandwidth{type="ingress",route="1",service="",consumer="",node="127.0.0.1"} 332
apisix_bandwidth{type="ingress",route="2",service="",consumer="",node="127.0.0.1"} 332
# HELP apisix_etcd_modify_indexes Etcd modify index for APISIX keys
# TYPE apisix_etcd_modify_indexes gauge
apisix_etcd_modify_indexes{key="consumers"} 0
apisix_etcd_modify_indexes{key="global_rules"} 0
...

在公共 API 端点上暴露 APISIX 指标

以下示例展示了如何禁用默认在端口 9091 上暴露端点的 Prometheus 导出服务器，并在 APISIX 用于监听其他客户端请求的端口 9080 上的新公共 API 端点上暴露 APISIX Prometheus 指标。

警告

如果收集大量指标，插件可能会占用大量 CPU 资源进行指标计算，并对常规请求的处理产生负面影响。

为了解决这个问题，APISIX 使用特权代理（privileged agent）并将指标计算卸载到单独的进程。如果你使用配置文件中配置的指标端点（如上文所示），此优化将自动应用。如果你使用 public-api 插件暴露指标端点，你将无法从该优化中受益。

在配置文件中禁用 Prometheus 导出服务器，并重新加载 APISIX 以使更改生效：

conf/config.yaml
plugin_attr:
  prometheus:
    enable_export_server: false

接下来，创建一个带有 public-api 插件的路由，并为 APISIX 指标暴露一个公共 API 端点：

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "prometheus-metrics",
    "uri": "/apisix/prometheus/metrics",
    "plugins": {
      "public-api": {}
    }
  }'

向新的指标端点发送请求以进行验证：

curl "http://127.0.0.1:9080/apisix/prometheus/metrics"

你应该看到类似于以下的输出：

# HELP apisix_http_requests_total The total number of client requests since APISIX started
# TYPE apisix_http_requests_total gauge
apisix_http_requests_total 1
# HELP apisix_nginx_http_current_connections Number of HTTP connections
# TYPE apisix_nginx_http_current_connections gauge
apisix_nginx_http_current_connections{state="accepted"} 1
apisix_nginx_http_current_connections{state="active"} 1
apisix_nginx_http_current_connections{state="handled"} 1
apisix_nginx_http_current_connections{state="reading"} 0
apisix_nginx_http_current_connections{state="waiting"} 0
apisix_nginx_http_current_connections{state="writing"} 1
...

将 APISIX 与 Prometheus 和 Grafana 集成

要了解如何使用 Prometheus 收集 APISIX 指标并在 Grafana 中将其可视化，请参阅操作指南。

监控上游健康状态

以下示例展示了如何监控上游节点的健康状态。

创建一个带有 prometheus 插件的路由并配置上游主动健康检查：

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "prometheus-route",
    "uri": "/get",
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "httpbin.org:80": 1,
        "127.0.0.1:20001": 1
      },
      "checks": {
        "active": {
          "timeout": 5,
          "http_path": "/status",
          "healthy": {
            "interval": 2,
            "successes": 1
          },
          "unhealthy": {
            "interval": 1,
            "http_failures": 2
          }
        },
        "passive": {
          "healthy": {
            "http_statuses": [200, 201],
            "successes": 3
          },
          "unhealthy": {
            "http_statuses": [500],
            "http_failures": 3,
            "tcp_failures": 3
          }
        }
      }
    }
  }'

向 APISIX Prometheus 指标端点发送请求：

curl "http://127.0.0.1:9091/apisix/prometheus/metrics"

你应该看到类似于以下的输出：

# HELP apisix_upstream_status upstream status from health check
# TYPE apisix_upstream_status gauge
apisix_upstream_status{name="/apisix/routes/1",ip="54.237.103.220",port="80"} 1
apisix_upstream_status{name="/apisix/routes/1",ip="127.0.0.1",port="20001"} 0

这表明上游节点 httpbin.org:80 是健康的，而上游节点 127.0.0.1:20001 是不健康的。

要了解有关如何配置主动和被动健康检查的更多信息，请参阅健康检查。

为指标添加额外标签

以下示例展示了如何向指标添加额外标签并在标签值中使用内置变量。

目前，只有以下指标支持额外标签：

apisix_http_status
apisix_http_latency
apisix_bandwidth

在配置文件中包含以下配置以为指标添加标签，并重新加载 APISIX 以使更改生效：

conf/config.yaml
plugin_attr:
  prometheus:                                # Plugin: prometheus
    metrics:                                 # Create extra labels from built-in variables.
      http_status:
        extra_labels:                        # Set the extra labels for http_status metrics.
          - upstream_addr: $upstream_addr    # Add an extra upstream_addr label with value being the NGINX variable $upstream_addr.
          - route_name: $route_name          # Add an extra route_name label with value being the APISIX variable $route_name.

请注意，如果你在标签值中定义了一个变量，但它不对应任何现有的内置变量，则标签值将默认为空字符串。

创建一个带有 prometheus 插件的路由：

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "prometheus-route",
    "uri": "/get",
    "name": "extra-label",
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

发送请求到该路由以进行验证：

curl -i "http://127.0.0.1:9080/get"

你应该看到 HTTP/1.1 200 OK 响应。

向 APISIX Prometheus 指标端点发送请求：

curl "http://127.0.0.1:9091/apisix/prometheus/metrics"

你应该看到类似于以下的输出：

# HELP apisix_http_status HTTP status codes per service in APISIX
# TYPE apisix_http_status counter
apisix_http_status{code="200",route="1",matched_uri="/get",matched_host="",service="",consumer="",node="54.237.103.220",upstream_addr="54.237.103.220:80",route_name="extra-label"} 1

使用 Prometheus 监控 TCP/UDP 流量

以下示例展示了如何在 APISIX 中收集 TCP/UDP 流量指标。

在配置文件中包含以下配置以启用流代理并为流代理启用 prometheus 插件。重新加载 APISIX 以使更改生效：

conf/config.yaml
apisix:
  proxy_mode: http&stream   # Enable both L4 & L7 proxies
  stream_proxy:             # Configure L4 proxy
    tcp:
      - 9100                # Set TCP proxy listening port
    udp:
      - 9200                # Set UDP proxy listening port

stream_plugins:
  - prometheus              # Enable prometheus for stream proxy

创建一个带有 prometheus 插件的流路由：

curl "http://127.0.0.1:9180/apisix/admin/stream_routes" -X PUT \
  -H "X-API-KEY: ${ADMIN_API_KEY}" \
  -d '{
    "id": "prometheus-route",
    "plugins": {
      "prometheus":{}
    },
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

发送请求到该流路由以进行验证：

curl -i "http://127.0.0.1:9100"

你应该看到 HTTP/1.1 200 OK 响应。

向 APISIX Prometheus 指标端点发送请求：

curl "http://127.0.0.1:9091/apisix/prometheus/metrics"

你应该看到类似于以下的输出：

# HELP apisix_stream_connection_total Total number of connections handled per stream route in APISIX
# TYPE apisix_stream_connection_total counter
apisix_stream_connection_total{route="1"} 1

指标​

标签​

apisix_http_status 的标签​

apisix_bandwidth 的标签​

apisix_http_latency 的标签​

延迟类型​

apisix_upstream_status 的标签​

apisix_llm_latency 的标签​

其他 LLM 指标的标签​

示例​

获取 APISIX 指标​

在公共 API 端点上暴露 APISIX 指标​

将 APISIX 与 Prometheus 和 Grafana 集成​

监控上游健康状态​

为指标添加额外标签​

使用 Prometheus 监控 TCP/UDP 流量​

指标

标签

`apisix_http_status` 的标签

`apisix_bandwidth` 的标签

`apisix_http_latency` 的标签

延迟类型

`apisix_upstream_status` 的标签

`apisix_llm_latency` 的标签

其他 LLM 指标的标签

示例

获取 APISIX 指标

在公共 API 端点上暴露 APISIX 指标

将 APISIX 与 Prometheus 和 Grafana 集成

监控上游健康状态

为指标添加额外标签

使用 Prometheus 监控 TCP/UDP 流量