Skip to content

Cap Prometheus end-user metric cardinality#27272

Merged
ishaan-berri merged 1 commit into
litellm_internal_stagingfrom
litellm_cap-prometheus-end-user-cardinality-b255
May 6, 2026
Merged

Cap Prometheus end-user metric cardinality#27272
ishaan-berri merged 1 commit into
litellm_internal_stagingfrom
litellm_cap-prometheus-end-user-cardinality-b255

Conversation

@ishaan-berri
Copy link
Copy Markdown
Contributor

@ishaan-berri ishaan-berri commented May 6, 2026

Summary

Prometheus end-user cost tracking could create one in-memory Prometheus child series per unique end_user label tuple with no cleanup path. This adds a generic BoundedPrometheusSeriesTracker helper in its own module (litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py) that can enforce max-series and TTL cleanup for any Prometheus label tuple; the current policy applies it to metrics with resolved end_user labels while keeping end-user tracking available. TTL sweeps run on a bounded interval so emission remains near baseline speed, and the deprecated failure metric keeps its positional .labels(...) call contract for existing tests/callers.

Repro

On the starting ref, this minimal Prometheus-client reproduction shows the unbounded child map behavior:

python3 - <<'PY'
from prometheus_client import Counter, CollectorRegistry
metric = Counter('litellm_repro_end_user_cardinality_total', 'repro', ['end_user'], registry=CollectorRegistry())
for i in range(1000):
    metric.labels(end_user=f'end-user-{i}').inc()
print('children_in__metrics:', len(metric._metrics))
print('expected_unbounded_growth:', len(metric._metrics) == 1000)
PY

Output:

children_in__metrics: 1000
expected_unbounded_growth: True

Memory graph benchmark used for before/after proof:

LITELLM_REPO=<repo> SERIES_COUNT=50000 SAMPLE_STEP=2500 BENCH_OUT=<csv> PYTHONPATH=<repo> python3 /tmp/prometheus_end_user_memory_timeseries.py

Evidence

RSS growth graph:
Prometheus end_user RSS growth before vs after

Retained Prometheus child series graph:
Prometheus end_user retained series before vs after

Final sampled benchmark values:

Before final sample:
{'emitted': 50000.0, 'rss_delta_kb': 92484.0, 'tracemalloc_current_kb': 30267.0, 'tracemalloc_peak_kb': 30271.0, 'prometheus_children': 50000.0, 'elapsed_seconds': 4.931}

After final sample:
{'emitted': 50000.0, 'rss_delta_kb': 24552.0, 'tracemalloc_current_kb': 8073.0, 'tracemalloc_peak_kb': 8650.0, 'prometheus_children': 10000.0, 'elapsed_seconds': 5.155}

Latest regression run including the reported enterprise failure:

PYTHONPATH=/workspace python3 -m pytest \
  tests/enterprise/litellm_enterprise/enterprise_callbacks/test_prometheus_logging_callbacks.py::test_async_log_failure_event \
  tests/enterprise/litellm_enterprise/enterprise_callbacks/test_prometheus_logging_callbacks.py::test_async_log_failure_event_litellm_side_rate_limit \
  tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py \
  tests/test_litellm/integrations/test_prometheus_labels.py -q
....................                                                     [100%]
20 passed in 0.41s

Tests

  • Added tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py covering per-metric cap, TTL expiry, default Prometheus end-user disabled behavior, and generic label-agnostic cleanup behavior.
  • Ran python3 -m black litellm/__init__.py litellm/integrations/prometheus.py litellm/integrations/prometheus_helpers/__init__.py litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py.
  • Ran the latest regression command above -> 20 passed.
  • Ran before/after sampled memory benchmarks against origin/litellm_internal_staging and this branch with 50,000 unique end users.

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

See ## Evidence.

Type

🐛 Bug Fix
✅ Test

Changes

  • Added generic bounded cleanup for Prometheus metric children in a dedicated helper module.
  • Applied the cleanup policy to metrics that include a resolved end_user label.
  • Preserved positional labels for the deprecated failure metric path.
  • Added defaults for per-metric series cap, TTL, and TTL cleanup interval.
  • Added regression tests for cap, TTL expiry, default Prometheus end-user behavior, and generic label-agnostic cleanup.
Open in Web Open in Cursor 
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 6, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 77.55102% with 11 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...theus_helpers/bounded_prometheus_series_tracker.py 77.55% 11 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Failure path duplicates tracking logic, bypassing shared guards
    • Replaced the inlined positional-tuple tracking in async_log_failure_event with a call to _inc_labeled_counter, so the failure path now uses the same label filtering and bounded-series guard as the success path.
Preview (0bfa9b7d5d)
diff --git a/litellm/__init__.py b/litellm/__init__.py
--- a/litellm/__init__.py
+++ b/litellm/__init__.py
@@ -414,6 +414,9 @@
 custom_prometheus_tags: List[str] = []
 prometheus_metrics_config: Optional[List] = None
 prometheus_emit_stream_label: bool = False
+prometheus_end_user_metrics_max_series_per_metric: Optional[int] = 10000
+prometheus_end_user_metrics_ttl_seconds: Optional[float] = 3600.0
+prometheus_end_user_metrics_cleanup_interval_seconds: Optional[float] = 60.0
 disable_add_prefix_to_prompt: bool = (
     False  # used by anthropic, to disable adding prefix to prompt
 )

diff --git a/litellm/integrations/prometheus.py b/litellm/integrations/prometheus.py
--- a/litellm/integrations/prometheus.py
+++ b/litellm/integrations/prometheus.py
@@ -25,6 +25,9 @@
 import litellm
 from litellm._logging import print_verbose, verbose_logger
 from litellm.integrations.custom_logger import CustomLogger
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
 from litellm.integrations.prometheus_helpers import (
     PrometheusLabelFactoryContext,
     _get_cached_end_user_id_for_cost_tracking,
@@ -81,6 +84,7 @@
                 if _custom_buckets is not None
                 else LATENCY_BUCKETS
             )
+            self._bounded_prometheus_series_tracker = BoundedPrometheusSeriesTracker()
 
             # Create metric factory functions
             self._counter_factory = self._create_metric_factory(Counter)
@@ -984,6 +988,54 @@
 
         return filtered_labels
 
+    def _get_labeled_metric(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> Any:
+        labeled_metric = metric.labels(**labels)
+        self._track_bounded_prometheus_metric_series(metric, metric_name, labels)
+        return labeled_metric
+
+    def _track_bounded_prometheus_metric_series(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> None:
+        labelnames = self.get_labels_for_metric(metric_name)
+        if UserAPIKeyLabelNames.END_USER.value not in labelnames:
+            return
+
+        end_user = labels.get(UserAPIKeyLabelNames.END_USER.value)
+        if end_user is None:
+            return
+
+        max_series = getattr(
+            litellm, "prometheus_end_user_metrics_max_series_per_metric", 10000
+        )
+        ttl_seconds = getattr(
+            litellm, "prometheus_end_user_metrics_ttl_seconds", 3600.0
+        )
+        ttl_cleanup_interval_seconds = getattr(
+            litellm,
+            "prometheus_end_user_metrics_cleanup_interval_seconds",
+            60.0,
+        )
+        if max_series is None and ttl_seconds is None:
+            return
+
+        label_values = tuple(labels.get(label) for label in labelnames)
+        self._bounded_prometheus_series_tracker.track_series(
+            metric=metric,
+            metric_name=metric_name,
+            label_values=label_values,
+            max_series=max_series,
+            ttl_seconds=ttl_seconds,
+            cleanup_interval_seconds=ttl_cleanup_interval_seconds,
+        )
+
     def _inc_labeled_counter(
         self,
         counter: Any,
@@ -997,7 +1049,7 @@
             enum_values=enum_values,
             label_context=label_context,
         )
-        counter.labels(**_labels).inc(amount)
+        self._get_labeled_metric(counter, metric_name, _labels).inc(amount)
 
     async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
         # Define prometheus client
@@ -1468,8 +1520,10 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_time_to_first_token_metric.labels(
-                **_ttft_labels
+            self._get_labeled_metric(
+                self.litellm_llm_api_time_to_first_token_metric,
+                "litellm_llm_api_time_to_first_token_metric",
+                _ttft_labels,
             ).observe(time_to_first_token_seconds)
         else:
             verbose_logger.debug(
@@ -1488,9 +1542,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_latency_metric.labels(**_labels).observe(
-                api_call_total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_llm_api_latency_metric,
+                "litellm_llm_api_latency_metric",
+                _labels,
+            ).observe(api_call_total_time_seconds)
 
         # total request latency
         total_time_seconds = self._safe_duration_seconds(
@@ -1505,9 +1561,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_total_latency_metric.labels(**_labels).observe(
-                total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_total_latency_metric,
+                "litellm_request_total_latency_metric",
+                _labels,
+            ).observe(total_time_seconds)
 
         # request queue time (time from arrival to processing start)
         _litellm_params = kwargs.get("litellm_params", {}) or {}
@@ -1522,9 +1580,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_queue_time_metric.labels(**_labels).observe(
-                queue_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_queue_time_metric,
+                "litellm_request_queue_time_seconds",
+                _labels,
+            ).observe(queue_time_seconds)
 
     async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
         verbose_logger.debug(
@@ -1561,18 +1621,20 @@
         )
 
         try:
-            self.litellm_llm_api_failed_requests_metric.labels(
-                _sanitize_prometheus_label_value(end_user_id),
-                _sanitize_prometheus_label_value(user_api_key),
-                _sanitize_prometheus_label_value(user_api_key_alias),
-                _sanitize_prometheus_label_value(model),
-                _sanitize_prometheus_label_value(user_api_team),
-                _sanitize_prometheus_label_value(user_api_team_alias),
-                _sanitize_prometheus_label_value(user_id),
-                _sanitize_prometheus_label_value(
-                    standard_logging_payload.get("model_id", "")
+            self._inc_labeled_counter(
+                counter=self.litellm_llm_api_failed_requests_metric,
+                metric_name="litellm_llm_api_failed_requests_metric",
+                enum_values=UserAPIKeyLabelValues(
+                    end_user=end_user_id,
+                    hashed_api_key=user_api_key,
+                    api_key_alias=user_api_key_alias,
+                    model=model,
+                    team=user_api_team,
+                    team_alias=user_api_team_alias,
+                    user=user_id,
+                    model_id=standard_logging_payload.get("model_id", ""),
                 ),
-            ).inc()
+            )
             self.set_llm_deployment_failure_metrics(kwargs)
             await self._set_org_budget_metrics_after_api_request(
                 org_id=user_api_key_org_id,

diff --git a/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
new file mode 100644
--- /dev/null
+++ b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
@@ -1,0 +1,96 @@
+from __future__ import annotations
+
+import time
+from collections import OrderedDict
+from threading import RLock
+from typing import Any, Dict, Optional
+
+
+class BoundedPrometheusSeriesTracker:
+    """
+    Tracks Prometheus child series and removes stale/excess labelsets.
+
+    The tracker is label-agnostic: callers decide which series should be tracked
+    and pass the full label tuple used by the Prometheus metric.
+    """
+
+    def __init__(self) -> None:
+        self._series: Dict[str, OrderedDict[tuple[Optional[str], ...], float]] = {}
+        self._last_ttl_cleanup: Dict[str, float] = {}
+        self._lock = RLock()
+
+    def track_series(
+        self,
+        metric: Any,
+        metric_name: str,
+        label_values: tuple[Optional[str], ...],
+        max_series: Optional[int],
+        ttl_seconds: Optional[float],
+        cleanup_interval_seconds: Optional[float],
+    ) -> None:
+        if max_series is None and ttl_seconds is None:
+            return
+
+        now = time.monotonic()
+
+        with self._lock:
+            series = self._series.setdefault(metric_name, OrderedDict())
+            series[label_values] = now
+            series.move_to_end(label_values)
+
+            if ttl_seconds is not None and self._should_run_ttl_cleanup(
+                metric_name=metric_name,
+                now=now,
+                cleanup_interval_seconds=cleanup_interval_seconds,
+            ):
+                expired_label_values = [
+                    tracked_label_values
+                    for tracked_label_values, last_seen in series.items()
+                    if now - last_seen > ttl_seconds
+                ]
+                for tracked_label_values in expired_label_values:
+                    self._remove_metric_series(metric, series, tracked_label_values)
+
+            if max_series is not None and max_series > 0:
+                while len(series) > max_series:
+                    tracked_label_values, _ = series.popitem(last=False)
+                    self._remove_metric_child(metric, tracked_label_values)
+            elif max_series is not None:
+                while series:
+                    tracked_label_values, _ = series.popitem(last=False)
+                    self._remove_metric_child(metric, tracked_label_values)
+
+    def _should_run_ttl_cleanup(
+        self,
+        metric_name: str,
+        now: float,
+        cleanup_interval_seconds: Optional[float],
+    ) -> bool:
+        if cleanup_interval_seconds is None or cleanup_interval_seconds <= 0:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+
+        last_cleanup = self._last_ttl_cleanup.get(metric_name)
+        if last_cleanup is None or now - last_cleanup >= cleanup_interval_seconds:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+        return False
+
+    def _remove_metric_series(
+        self,
+        metric: Any,
+        series: OrderedDict[tuple[Optional[str], ...], float],
+        label_values: tuple[Optional[str], ...],
+    ) -> None:
+        if label_values in series:
+            del series[label_values]
+        self._remove_metric_child(metric, label_values)
+
+    @staticmethod
+    def _remove_metric_child(
+        metric: Any, label_values: tuple[Optional[str], ...]
+    ) -> None:
+        try:
+            metric.remove(*label_values)
+        except (AttributeError, KeyError, ValueError):
+            pass

diff --git a/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
new file mode 100644
--- /dev/null
+++ b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
@@ -1,0 +1,156 @@
+from time import monotonic
+
+import pytest
+from prometheus_client import REGISTRY
+
+import litellm
+from litellm.integrations.prometheus import PrometheusLogger
+from litellm.integrations.prometheus_helpers import bounded_prometheus_series_tracker
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
+from litellm.types.integrations.prometheus import UserAPIKeyLabelValues
+
+
+@pytest.fixture(autouse=True)
+def cleanup_prometheus_registry():
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+    old_enable_end_user = litellm.enable_end_user_cost_tracking_prometheus_only
+    old_metrics_config = litellm.prometheus_metrics_config
+    old_max_series = litellm.prometheus_end_user_metrics_max_series_per_metric
+    old_ttl_seconds = litellm.prometheus_end_user_metrics_ttl_seconds
+    old_cleanup_interval_seconds = (
+        litellm.prometheus_end_user_metrics_cleanup_interval_seconds
+    )
+
+    yield
+
+    litellm.enable_end_user_cost_tracking_prometheus_only = old_enable_end_user
+    litellm.prometheus_metrics_config = old_metrics_config
+    litellm.prometheus_end_user_metrics_max_series_per_metric = old_max_series
+    litellm.prometheus_end_user_metrics_ttl_seconds = old_ttl_seconds
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = (
+        old_cleanup_interval_seconds
+    )
+
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+
+def test_prometheus_end_user_series_are_capped_per_metric():
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = 3
+    litellm.prometheus_end_user_metrics_ttl_seconds = None
+    logger = PrometheusLogger()
+
+    for index in range(6):
+        PrometheusLogger._inc_labeled_counter(
+            logger,
+            logger.litellm_spend_metric,
+            "litellm_spend_metric",
+            UserAPIKeyLabelValues(end_user=f"end-user-{index}"),
+            amount=0.01,
+        )
+
+    assert len(logger.litellm_spend_metric._metrics) == 3
+    assert set(logger.litellm_spend_metric._metrics) == {
+        ("end-user-3",),
+        ("end-user-4",),
+        ("end-user-5",),
+    }
+
+
+def test_bounded_prometheus_series_tracker_is_label_agnostic():
+    class FakeMetric:
+        def __init__(self):
+            self.removed_label_values = []
+
+        def remove(self, *label_values):
+            self.removed_label_values.append(label_values)
+
+    metric = FakeMetric()
+    tracker = BoundedPrometheusSeriesTracker()
+
+    for index in range(4):
+        tracker.track_series(
+            metric=metric,
+            metric_name="generic_metric",
+            label_values=(f"route-{index}", "200"),
+            max_series=2,
+            ttl_seconds=None,
+            cleanup_interval_seconds=60.0,
+        )
+
+    assert metric.removed_label_values == [
+        ("route-0", "200"),
+        ("route-1", "200"),
+    ]
+
+
+def test_prometheus_end_user_series_expire_by_ttl(monkeypatch):
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = None
+    litellm.prometheus_end_user_metrics_ttl_seconds = 10.0
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = 0.0
+    logger = PrometheusLogger()
+
+    current_time = [monotonic()]
+    monkeypatch.setattr(
+        bounded_prometheus_series_tracker.time,
+        "monotonic",
+        lambda: current_time[0],
+    )
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="stale-end-user"),
+        amount=0.01,
+    )
+
+    current_time[0] += 11.0
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="fresh-end-user"),
+        amount=0.01,
+    )
+
+    assert set(logger.litellm_spend_metric._metrics) == {("fresh-end-user",)}
+
+
+def test_prometheus_end_user_not_tracked_by_default():
+    litellm.enable_end_user_cost_tracking_prometheus_only = None
+    labels = PrometheusLogger().get_labels_for_metric("litellm_spend_metric")
+    assert "end_user" in labels
+
+    label_values = UserAPIKeyLabelValues(end_user="not-exported")
+    from litellm.integrations.prometheus import prometheus_label_factory
+
+    prometheus_labels = prometheus_label_factory(labels, label_values)
+    assert prometheus_labels["end_user"] is None

You can send follow-ups to the cloud agent here.

Comment thread litellm/integrations/prometheus.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Race condition between child creation and tracker eviction
    • Acquired the tracker's RLock around both metric.labels() and track_series() in _get_labeled_metric so a concurrent track_series cannot evict the just-created child before the caller increments/observes it.
  • ✅ Fixed: Tracker-Prometheus divergence on silent removal failure
    • Made _remove_metric_child return whether removal actually succeeded and only delete the entry from the tracker's OrderedDict (in both max_series eviction and TTL cleanup paths) when it did, preventing the tracker from forgetting children that still exist in Prometheus.
Preview (bede81b2b9)
diff --git a/litellm/__init__.py b/litellm/__init__.py
--- a/litellm/__init__.py
+++ b/litellm/__init__.py
@@ -414,6 +414,9 @@
 custom_prometheus_tags: List[str] = []
 prometheus_metrics_config: Optional[List] = None
 prometheus_emit_stream_label: bool = False
+prometheus_end_user_metrics_max_series_per_metric: Optional[int] = 10000
+prometheus_end_user_metrics_ttl_seconds: Optional[float] = 3600.0
+prometheus_end_user_metrics_cleanup_interval_seconds: Optional[float] = 60.0
 disable_add_prefix_to_prompt: bool = (
     False  # used by anthropic, to disable adding prefix to prompt
 )

diff --git a/litellm/integrations/prometheus.py b/litellm/integrations/prometheus.py
--- a/litellm/integrations/prometheus.py
+++ b/litellm/integrations/prometheus.py
@@ -25,6 +25,9 @@
 import litellm
 from litellm._logging import print_verbose, verbose_logger
 from litellm.integrations.custom_logger import CustomLogger
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
 from litellm.integrations.prometheus_helpers import (
     PrometheusLabelFactoryContext,
     _get_cached_end_user_id_for_cost_tracking,
@@ -81,6 +84,7 @@
                 if _custom_buckets is not None
                 else LATENCY_BUCKETS
             )
+            self._bounded_prometheus_series_tracker = BoundedPrometheusSeriesTracker()
 
             # Create metric factory functions
             self._counter_factory = self._create_metric_factory(Counter)
@@ -984,6 +988,55 @@
 
         return filtered_labels
 
+    def _get_labeled_metric(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> Any:
+        with self._bounded_prometheus_series_tracker.lock:
+            labeled_metric = metric.labels(**labels)
+            self._track_bounded_prometheus_metric_series(metric, metric_name, labels)
+            return labeled_metric
+
+    def _track_bounded_prometheus_metric_series(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> None:
+        labelnames = self.get_labels_for_metric(metric_name)
+        if UserAPIKeyLabelNames.END_USER.value not in labelnames:
+            return
+
+        end_user = labels.get(UserAPIKeyLabelNames.END_USER.value)
+        if end_user is None:
+            return
+
+        max_series = getattr(
+            litellm, "prometheus_end_user_metrics_max_series_per_metric", 10000
+        )
+        ttl_seconds = getattr(
+            litellm, "prometheus_end_user_metrics_ttl_seconds", 3600.0
+        )
+        ttl_cleanup_interval_seconds = getattr(
+            litellm,
+            "prometheus_end_user_metrics_cleanup_interval_seconds",
+            60.0,
+        )
+        if max_series is None and ttl_seconds is None:
+            return
+
+        label_values = tuple(labels.get(label) for label in labelnames)
+        self._bounded_prometheus_series_tracker.track_series(
+            metric=metric,
+            metric_name=metric_name,
+            label_values=label_values,
+            max_series=max_series,
+            ttl_seconds=ttl_seconds,
+            cleanup_interval_seconds=ttl_cleanup_interval_seconds,
+        )
+
     def _inc_labeled_counter(
         self,
         counter: Any,
@@ -997,7 +1050,7 @@
             enum_values=enum_values,
             label_context=label_context,
         )
-        counter.labels(**_labels).inc(amount)
+        self._get_labeled_metric(counter, metric_name, _labels).inc(amount)
 
     async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
         # Define prometheus client
@@ -1468,8 +1521,10 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_time_to_first_token_metric.labels(
-                **_ttft_labels
+            self._get_labeled_metric(
+                self.litellm_llm_api_time_to_first_token_metric,
+                "litellm_llm_api_time_to_first_token_metric",
+                _ttft_labels,
             ).observe(time_to_first_token_seconds)
         else:
             verbose_logger.debug(
@@ -1488,9 +1543,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_latency_metric.labels(**_labels).observe(
-                api_call_total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_llm_api_latency_metric,
+                "litellm_llm_api_latency_metric",
+                _labels,
+            ).observe(api_call_total_time_seconds)
 
         # total request latency
         total_time_seconds = self._safe_duration_seconds(
@@ -1505,9 +1562,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_total_latency_metric.labels(**_labels).observe(
-                total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_total_latency_metric,
+                "litellm_request_total_latency_metric",
+                _labels,
+            ).observe(total_time_seconds)
 
         # request queue time (time from arrival to processing start)
         _litellm_params = kwargs.get("litellm_params", {}) or {}
@@ -1522,9 +1581,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_queue_time_metric.labels(**_labels).observe(
-                queue_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_queue_time_metric,
+                "litellm_request_queue_time_seconds",
+                _labels,
+            ).observe(queue_time_seconds)
 
     async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
         verbose_logger.debug(
@@ -1561,18 +1622,20 @@
         )
 
         try:
-            self.litellm_llm_api_failed_requests_metric.labels(
-                _sanitize_prometheus_label_value(end_user_id),
-                _sanitize_prometheus_label_value(user_api_key),
-                _sanitize_prometheus_label_value(user_api_key_alias),
-                _sanitize_prometheus_label_value(model),
-                _sanitize_prometheus_label_value(user_api_team),
-                _sanitize_prometheus_label_value(user_api_team_alias),
-                _sanitize_prometheus_label_value(user_id),
-                _sanitize_prometheus_label_value(
-                    standard_logging_payload.get("model_id", "")
+            self._inc_labeled_counter(
+                counter=self.litellm_llm_api_failed_requests_metric,
+                metric_name="litellm_llm_api_failed_requests_metric",
+                enum_values=UserAPIKeyLabelValues(
+                    end_user=end_user_id,
+                    hashed_api_key=user_api_key,
+                    api_key_alias=user_api_key_alias,
+                    model=model,
+                    team=user_api_team,
+                    team_alias=user_api_team_alias,
+                    user=user_id,
+                    model_id=standard_logging_payload.get("model_id", ""),
                 ),
-            ).inc()
+            )
             self.set_llm_deployment_failure_metrics(kwargs)
             await self._set_org_budget_metrics_after_api_request(
                 org_id=user_api_key_org_id,

diff --git a/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
new file mode 100644
--- /dev/null
+++ b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
@@ -1,0 +1,111 @@
+from __future__ import annotations
+
+import time
+from collections import OrderedDict
+from threading import RLock
+from typing import Any, Dict, Optional
+
+
+class BoundedPrometheusSeriesTracker:
+    """
+    Tracks Prometheus child series and removes stale/excess labelsets.
+
+    The tracker is label-agnostic: callers decide which series should be tracked
+    and pass the full label tuple used by the Prometheus metric.
+    """
+
+    def __init__(self) -> None:
+        self._series: Dict[str, OrderedDict[tuple[Optional[str], ...], float]] = {}
+        self._last_ttl_cleanup: Dict[str, float] = {}
+        self.lock = RLock()
+
+    def track_series(
+        self,
+        metric: Any,
+        metric_name: str,
+        label_values: tuple[Optional[str], ...],
+        max_series: Optional[int],
+        ttl_seconds: Optional[float],
+        cleanup_interval_seconds: Optional[float],
+    ) -> None:
+        if max_series is None and ttl_seconds is None:
+            return
+
+        now = time.monotonic()
+
+        with self.lock:
+            series = self._series.setdefault(metric_name, OrderedDict())
+            series[label_values] = now
+            series.move_to_end(label_values)
+
+            if ttl_seconds is not None and self._should_run_ttl_cleanup(
+                metric_name=metric_name,
+                now=now,
+                cleanup_interval_seconds=cleanup_interval_seconds,
+            ):
+                expired_label_values = [
+                    tracked_label_values
+                    for tracked_label_values, last_seen in series.items()
+                    if now - last_seen > ttl_seconds
+                ]
+                for tracked_label_values in expired_label_values:
+                    self._remove_metric_series(metric, series, tracked_label_values)
+
+            if max_series is not None and max_series > 0:
+                while len(series) > max_series:
+                    tracked_label_values = next(iter(series))
+                    if not self._remove_metric_child(metric, tracked_label_values):
+                        break
+                    del series[tracked_label_values]
+            elif max_series is not None:
+                while series:
+                    tracked_label_values = next(iter(series))
+                    if not self._remove_metric_child(metric, tracked_label_values):
+                        break
+                    del series[tracked_label_values]
+
+    def _should_run_ttl_cleanup(
+        self,
+        metric_name: str,
+        now: float,
+        cleanup_interval_seconds: Optional[float],
+    ) -> bool:
+        if cleanup_interval_seconds is None or cleanup_interval_seconds <= 0:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+
+        last_cleanup = self._last_ttl_cleanup.get(metric_name)
+        if last_cleanup is None or now - last_cleanup >= cleanup_interval_seconds:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+        return False
+
+    def _remove_metric_series(
+        self,
+        metric: Any,
+        series: OrderedDict[tuple[Optional[str], ...], float],
+        label_values: tuple[Optional[str], ...],
+    ) -> None:
+        if self._remove_metric_child(metric, label_values):
+            series.pop(label_values, None)
+
+    @staticmethod
+    def _remove_metric_child(
+        metric: Any, label_values: tuple[Optional[str], ...]
+    ) -> bool:
+        """
+        Remove the Prometheus child for ``label_values`` and report whether the
+        tracker should commit the matching state change.
+
+        Returns ``True`` when the child is no longer present in Prometheus
+        (either it was just removed or it was already gone), and ``False`` when
+        ``metric.remove()`` raised an unexpected error and the child likely
+        still exists.
+        """
+        try:
+            metric.remove(*label_values)
+            return True
+        except KeyError:
+            return True
+        except (AttributeError, ValueError):
+            return False

diff --git a/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
new file mode 100644
--- /dev/null
+++ b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
@@ -1,0 +1,156 @@
+from time import monotonic
+
+import pytest
+from prometheus_client import REGISTRY
+
+import litellm
+from litellm.integrations.prometheus import PrometheusLogger
+from litellm.integrations.prometheus_helpers import bounded_prometheus_series_tracker
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
+from litellm.types.integrations.prometheus import UserAPIKeyLabelValues
+
+
+@pytest.fixture(autouse=True)
+def cleanup_prometheus_registry():
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+    old_enable_end_user = litellm.enable_end_user_cost_tracking_prometheus_only
+    old_metrics_config = litellm.prometheus_metrics_config
+    old_max_series = litellm.prometheus_end_user_metrics_max_series_per_metric
+    old_ttl_seconds = litellm.prometheus_end_user_metrics_ttl_seconds
+    old_cleanup_interval_seconds = (
+        litellm.prometheus_end_user_metrics_cleanup_interval_seconds
+    )
+
+    yield
+
+    litellm.enable_end_user_cost_tracking_prometheus_only = old_enable_end_user
+    litellm.prometheus_metrics_config = old_metrics_config
+    litellm.prometheus_end_user_metrics_max_series_per_metric = old_max_series
+    litellm.prometheus_end_user_metrics_ttl_seconds = old_ttl_seconds
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = (
+        old_cleanup_interval_seconds
+    )
+
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+
+def test_prometheus_end_user_series_are_capped_per_metric():
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = 3
+    litellm.prometheus_end_user_metrics_ttl_seconds = None
+    logger = PrometheusLogger()
+
+    for index in range(6):
+        PrometheusLogger._inc_labeled_counter(
+            logger,
+            logger.litellm_spend_metric,
+            "litellm_spend_metric",
+            UserAPIKeyLabelValues(end_user=f"end-user-{index}"),
+            amount=0.01,
+        )
+
+    assert len(logger.litellm_spend_metric._metrics) == 3
+    assert set(logger.litellm_spend_metric._metrics) == {
+        ("end-user-3",),
+        ("end-user-4",),
+        ("end-user-5",),
+    }
+
+
+def test_bounded_prometheus_series_tracker_is_label_agnostic():
+    class FakeMetric:
+        def __init__(self):
+            self.removed_label_values = []
+
+        def remove(self, *label_values):
+            self.removed_label_values.append(label_values)
+
+    metric = FakeMetric()
+    tracker = BoundedPrometheusSeriesTracker()
+
+    for index in range(4):
+        tracker.track_series(
+            metric=metric,
+            metric_name="generic_metric",
+            label_values=(f"route-{index}", "200"),
+            max_series=2,
+            ttl_seconds=None,
+            cleanup_interval_seconds=60.0,
+        )
+
+    assert metric.removed_label_values == [
+        ("route-0", "200"),
+        ("route-1", "200"),
+    ]
+
+
+def test_prometheus_end_user_series_expire_by_ttl(monkeypatch):
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = None
+    litellm.prometheus_end_user_metrics_ttl_seconds = 10.0
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = 0.0
+    logger = PrometheusLogger()
+
+    current_time = [monotonic()]
+    monkeypatch.setattr(
+        bounded_prometheus_series_tracker.time,
+        "monotonic",
+        lambda: current_time[0],
+    )
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="stale-end-user"),
+        amount=0.01,
+    )
+
+    current_time[0] += 11.0
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="fresh-end-user"),
+        amount=0.01,
+    )
+
+    assert set(logger.litellm_spend_metric._metrics) == {("fresh-end-user",)}
+
+
+def test_prometheus_end_user_not_tracked_by_default():
+    litellm.enable_end_user_cost_tracking_prometheus_only = None
+    labels = PrometheusLogger().get_labels_for_metric("litellm_spend_metric")
+    assert "end_user" in labels
+
+    label_values = UserAPIKeyLabelValues(end_user="not-exported")
+    from litellm.integrations.prometheus import prometheus_label_factory
+
+    prometheus_labels = prometheus_label_factory(labels, label_values)
+    assert prometheus_labels["end_user"] is None

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 0bfa9b7. Configure here.

Comment thread litellm/integrations/prometheus.py Outdated
Comment thread litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py Outdated
Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request May 6, 2026
opencode/codex/litellm coverage:
- anomalyco/opencode#25962 desktop utilityProcess split (man)
- openai/codex#21287 skills watcher hoisted to app-server (man)
- BerriAI/litellm#27272 bounded prometheus end-user series tracker (man)
- BerriAI/litellm#27278 GCS extensionless URI MIME resolution (man)
@ishaan-berri ishaan-berri marked this pull request as ready for review May 6, 2026 18:32
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR introduces a BoundedPrometheusSeriesTracker helper that caps Prometheus child-series cardinality for metrics carrying an end_user label, replacing unbounded in-memory growth with LRU eviction and an optional TTL sweep.

  • Adds litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py with an OrderedDict-backed LRU + TTL eviction strategy; eviction uses a throttled cleanup interval to keep hot paths fast.
  • Wires the tracker into _inc_labeled_counter (counters) and into four histogram observe() call-sites (TTFT, LLM API latency, total request latency, queue time); three new litellm.* configuration knobs control the cap, TTL, and cleanup interval.

Confidence Score: 5/5

The change adds an opt-out cardinality cap with sensible defaults and all existing counters flow through the updated _inc_labeled_counter path; the four new histogram call-sites are consistent with their registration names and the new code has no backward-incompatible effect on callers that don't set end_user.

The LRU/TTL eviction logic is internally consistent: the current series is refreshed before TTL cleanup runs so it is never self-expired, the max_series <= 0 guard prevents silent eviction on a zero cap, AttributeError/ValueError from remove() are correctly handled without panicking, and the known trade-off of losing updates for an evicted series before the next scrape is explicitly documented. All previous-thread concerns are addressed in this revision and no new issues were found.

No files require special attention; the new helper and its integration points are straightforward and well-tested.

Important Files Changed

Filename Overview
litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py New helper: LRU + TTL eviction of Prometheus child series; logic is sound with cleanup_interval_seconds=None/0 semantics and KeyError-swallowing for already-absent children handled correctly
litellm/integrations/prometheus.py Wires tracker into _inc_labeled_counter and four histogram observe() sites; metric-name strings are consistent with registration; no new regressions introduced
litellm/init.py Adds three new module-level config knobs with sensible defaults (10 000 series, 1 h TTL, 60 s cleanup interval)
tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py New test file covering cap, TTL expiry, zero-max-as-unlimited, label-agnostic behavior, and default disabled path; no real network calls

Reviews (2): Last reviewed commit: "perf: cap Prometheus end-user metric car..." | Re-trigger Greptile

Comment thread litellm/integrations/prometheus.py Outdated
Comment thread litellm/integrations/prometheus.py Outdated
Comment thread litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py Outdated
@yassin-berriai yassin-berriai force-pushed the litellm_cap-prometheus-end-user-cardinality-b255 branch from bede81b to 66fcee1 Compare May 6, 2026 19:58
@yassin-berriai yassin-berriai force-pushed the litellm_cap-prometheus-end-user-cardinality-b255 branch from 66fcee1 to 962698e Compare May 6, 2026 20:00
@yassin-berriai
Copy link
Copy Markdown
Contributor

@ishaan-berri ishaan-berri enabled auto-merge (squash) May 6, 2026 20:08
@ishaan-berri ishaan-berri merged commit 487479e into litellm_internal_staging May 6, 2026
114 of 115 checks passed
@ishaan-berri ishaan-berri deleted the litellm_cap-prometheus-end-user-cardinality-b255 branch May 6, 2026 20:35
@jimmychen-p72
Copy link
Copy Markdown
Contributor

jimmychen-p72 commented May 7, 2026

sanity check: there is no data loss or unexpected behavior on Datadog for example when the metrics map is capped? for example, if the cap results in tracking 100 users and metrics are observed for 150 users then will we only propagate 100 metrics for those 100 users?

@ishaan-berri
Copy link
Copy Markdown
Contributor Author

@oss-agent-shin sanity check: there is no data loss or unexpected behavior on Datadog for example when the metrics map is capped? for example, if the cap results in tracking 100 users and metrics are observed for 150 users then will we only propagate 100 metrics for those 100 users?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

5 participants