Cap Prometheus end-user metric cardinality by ishaan-berri · Pull Request #27272 · BerriAI/litellm

ishaan-berri · 2026-05-06T02:21:06Z

Summary

Prometheus end-user cost tracking could create one in-memory Prometheus child series per unique end_user label tuple with no cleanup path. This adds a generic BoundedPrometheusSeriesTracker helper in its own module (litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py) that can enforce max-series and TTL cleanup for any Prometheus label tuple; the current policy applies it to metrics with resolved end_user labels while keeping end-user tracking available. TTL sweeps run on a bounded interval so emission remains near baseline speed, and the deprecated failure metric keeps its positional .labels(...) call contract for existing tests/callers.

Repro

On the starting ref, this minimal Prometheus-client reproduction shows the unbounded child map behavior:

python3 - <<'PY'
from prometheus_client import Counter, CollectorRegistry
metric = Counter('litellm_repro_end_user_cardinality_total', 'repro', ['end_user'], registry=CollectorRegistry())
for i in range(1000):
    metric.labels(end_user=f'end-user-{i}').inc()
print('children_in__metrics:', len(metric._metrics))
print('expected_unbounded_growth:', len(metric._metrics) == 1000)
PY

Output:

children_in__metrics: 1000
expected_unbounded_growth: True

Memory graph benchmark used for before/after proof:

LITELLM_REPO=<repo> SERIES_COUNT=50000 SAMPLE_STEP=2500 BENCH_OUT=<csv> PYTHONPATH=<repo> python3 /tmp/prometheus_end_user_memory_timeseries.py

Evidence

RSS growth graph:

Retained Prometheus child series graph:

Final sampled benchmark values:

Before final sample:
{'emitted': 50000.0, 'rss_delta_kb': 92484.0, 'tracemalloc_current_kb': 30267.0, 'tracemalloc_peak_kb': 30271.0, 'prometheus_children': 50000.0, 'elapsed_seconds': 4.931}

After final sample:
{'emitted': 50000.0, 'rss_delta_kb': 24552.0, 'tracemalloc_current_kb': 8073.0, 'tracemalloc_peak_kb': 8650.0, 'prometheus_children': 10000.0, 'elapsed_seconds': 5.155}

Latest regression run including the reported enterprise failure:

PYTHONPATH=/workspace python3 -m pytest \
  tests/enterprise/litellm_enterprise/enterprise_callbacks/test_prometheus_logging_callbacks.py::test_async_log_failure_event \
  tests/enterprise/litellm_enterprise/enterprise_callbacks/test_prometheus_logging_callbacks.py::test_async_log_failure_event_litellm_side_rate_limit \
  tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py \
  tests/test_litellm/integrations/test_prometheus_labels.py -q
....................                                                     [100%]
20 passed in 0.41s

Tests

Added tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py covering per-metric cap, TTL expiry, default Prometheus end-user disabled behavior, and generic label-agnostic cleanup behavior.
Ran python3 -m black litellm/__init__.py litellm/integrations/prometheus.py litellm/integrations/prometheus_helpers/__init__.py litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py.
Ran the latest regression command above -> 20 passed.
Ran before/after sampled memory benchmarks against origin/litellm_internal_staging and this branch with 50,000 unique end users.

Relevant issues

Linear ticket

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

See ## Evidence.

Type

🐛 Bug Fix
✅ Test

Changes

Added generic bounded cleanup for Prometheus metric children in a dedicated helper module.
Applied the cleanup policy to metrics that include a resolved end_user label.
Preserved positional labels for the deprecated failure metric path.
Added defaults for per-metric series cap, TTL, and TTL cleanup interval.
Added regression tests for cap, TTL expiry, default Prometheus end-user behavior, and generic label-agnostic cleanup.

CLAassistant · 2026-05-06T02:21:12Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2026-05-06T02:56:10Z

Codecov Report

❌ Patch coverage is 77.55102% with 11 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...theus_helpers/bounded_prometheus_series_tracker.py	77.55%	11 Missing ⚠️

📢 Thoughts on this report? Let us know!

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Failure path duplicates tracking logic, bypassing shared guards
- Replaced the inlined positional-tuple tracking in async_log_failure_event with a call to _inc_labeled_counter, so the failure path now uses the same label filtering and bounded-series guard as the success path.

Preview (0bfa9b7d5d)

diff --git a/litellm/__init__.py b/litellm/__init__.py
--- a/litellm/__init__.py
+++ b/litellm/__init__.py
@@ -414,6 +414,9 @@
 custom_prometheus_tags: List[str] = []
 prometheus_metrics_config: Optional[List] = None
 prometheus_emit_stream_label: bool = False
+prometheus_end_user_metrics_max_series_per_metric: Optional[int] = 10000
+prometheus_end_user_metrics_ttl_seconds: Optional[float] = 3600.0
+prometheus_end_user_metrics_cleanup_interval_seconds: Optional[float] = 60.0
 disable_add_prefix_to_prompt: bool = (
     False  # used by anthropic, to disable adding prefix to prompt
 )

diff --git a/litellm/integrations/prometheus.py b/litellm/integrations/prometheus.py
--- a/litellm/integrations/prometheus.py
+++ b/litellm/integrations/prometheus.py
@@ -25,6 +25,9 @@
 import litellm
 from litellm._logging import print_verbose, verbose_logger
 from litellm.integrations.custom_logger import CustomLogger
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
 from litellm.integrations.prometheus_helpers import (
     PrometheusLabelFactoryContext,
     _get_cached_end_user_id_for_cost_tracking,
@@ -81,6 +84,7 @@
                 if _custom_buckets is not None
                 else LATENCY_BUCKETS
             )
+            self._bounded_prometheus_series_tracker = BoundedPrometheusSeriesTracker()
 
             # Create metric factory functions
             self._counter_factory = self._create_metric_factory(Counter)
@@ -984,6 +988,54 @@
 
         return filtered_labels
 
+    def _get_labeled_metric(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> Any:
+        labeled_metric = metric.labels(**labels)
+        self._track_bounded_prometheus_metric_series(metric, metric_name, labels)
+        return labeled_metric
+
+    def _track_bounded_prometheus_metric_series(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> None:
+        labelnames = self.get_labels_for_metric(metric_name)
+        if UserAPIKeyLabelNames.END_USER.value not in labelnames:
+            return
+
+        end_user = labels.get(UserAPIKeyLabelNames.END_USER.value)
+        if end_user is None:
+            return
+
+        max_series = getattr(
+            litellm, "prometheus_end_user_metrics_max_series_per_metric", 10000
+        )
+        ttl_seconds = getattr(
+            litellm, "prometheus_end_user_metrics_ttl_seconds", 3600.0
+        )
+        ttl_cleanup_interval_seconds = getattr(
+            litellm,
+            "prometheus_end_user_metrics_cleanup_interval_seconds",
+            60.0,
+        )
+        if max_series is None and ttl_seconds is None:
+            return
+
+        label_values = tuple(labels.get(label) for label in labelnames)
+        self._bounded_prometheus_series_tracker.track_series(
+            metric=metric,
+            metric_name=metric_name,
+            label_values=label_values,
+            max_series=max_series,
+            ttl_seconds=ttl_seconds,
+            cleanup_interval_seconds=ttl_cleanup_interval_seconds,
+        )
+
     def _inc_labeled_counter(
         self,
         counter: Any,
@@ -997,7 +1049,7 @@
             enum_values=enum_values,
             label_context=label_context,
         )
-        counter.labels(**_labels).inc(amount)
+        self._get_labeled_metric(counter, metric_name, _labels).inc(amount)
 
     async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
         # Define prometheus client
@@ -1468,8 +1520,10 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_time_to_first_token_metric.labels(
-                **_ttft_labels
+            self._get_labeled_metric(
+                self.litellm_llm_api_time_to_first_token_metric,
+                "litellm_llm_api_time_to_first_token_metric",
+                _ttft_labels,
             ).observe(time_to_first_token_seconds)
         else:
             verbose_logger.debug(
@@ -1488,9 +1542,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_latency_metric.labels(**_labels).observe(
-                api_call_total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_llm_api_latency_metric,
+                "litellm_llm_api_latency_metric",
+                _labels,
+            ).observe(api_call_total_time_seconds)
 
         # total request latency
         total_time_seconds = self._safe_duration_seconds(
@@ -1505,9 +1561,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_total_latency_metric.labels(**_labels).observe(
-                total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_total_latency_metric,
+                "litellm_request_total_latency_metric",
+                _labels,
+            ).observe(total_time_seconds)
 
         # request queue time (time from arrival to processing start)
         _litellm_params = kwargs.get("litellm_params", {}) or {}
@@ -1522,9 +1580,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_queue_time_metric.labels(**_labels).observe(
-                queue_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_queue_time_metric,
+                "litellm_request_queue_time_seconds",
+                _labels,
+            ).observe(queue_time_seconds)
 
     async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
         verbose_logger.debug(
@@ -1561,18 +1621,20 @@
         )
 
         try:
-            self.litellm_llm_api_failed_requests_metric.labels(
-                _sanitize_prometheus_label_value(end_user_id),
-                _sanitize_prometheus_label_value(user_api_key),
-                _sanitize_prometheus_label_value(user_api_key_alias),
-                _sanitize_prometheus_label_value(model),
-                _sanitize_prometheus_label_value(user_api_team),
-                _sanitize_prometheus_label_value(user_api_team_alias),
-                _sanitize_prometheus_label_value(user_id),
-                _sanitize_prometheus_label_value(
-                    standard_logging_payload.get("model_id", "")
+            self._inc_labeled_counter(
+                counter=self.litellm_llm_api_failed_requests_metric,
+                metric_name="litellm_llm_api_failed_requests_metric",
+                enum_values=UserAPIKeyLabelValues(
+                    end_user=end_user_id,
+                    hashed_api_key=user_api_key,
+                    api_key_alias=user_api_key_alias,
+                    model=model,
+                    team=user_api_team,
+                    team_alias=user_api_team_alias,
+                    user=user_id,
+                    model_id=standard_logging_payload.get("model_id", ""),
                 ),
-            ).inc()
+            )
             self.set_llm_deployment_failure_metrics(kwargs)
             await self._set_org_budget_metrics_after_api_request(
                 org_id=user_api_key_org_id,

diff --git a/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
new file mode 100644
--- /dev/null
+++ b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
@@ -1,0 +1,96 @@
+from __future__ import annotations
+
+import time
+from collections import OrderedDict
+from threading import RLock
+from typing import Any, Dict, Optional
+
+
+class BoundedPrometheusSeriesTracker:
+    """
+    Tracks Prometheus child series and removes stale/excess labelsets.
+
+    The tracker is label-agnostic: callers decide which series should be tracked
+    and pass the full label tuple used by the Prometheus metric.
+    """
+
+    def __init__(self) -> None:
+        self._series: Dict[str, OrderedDict[tuple[Optional[str], ...], float]] = {}
+        self._last_ttl_cleanup: Dict[str, float] = {}
+        self._lock = RLock()
+
+    def track_series(
+        self,
+        metric: Any,
+        metric_name: str,
+        label_values: tuple[Optional[str], ...],
+        max_series: Optional[int],
+        ttl_seconds: Optional[float],
+        cleanup_interval_seconds: Optional[float],
+    ) -> None:
+        if max_series is None and ttl_seconds is None:
+            return
+
+        now = time.monotonic()
+
+        with self._lock:
+            series = self._series.setdefault(metric_name, OrderedDict())
+            series[label_values] = now
+            series.move_to_end(label_values)
+
+            if ttl_seconds is not None and self._should_run_ttl_cleanup(
+                metric_name=metric_name,
+                now=now,
+                cleanup_interval_seconds=cleanup_interval_seconds,
+            ):
+                expired_label_values = [
+                    tracked_label_values
+                    for tracked_label_values, last_seen in series.items()
+                    if now - last_seen > ttl_seconds
+                ]
+                for tracked_label_values in expired_label_values:
+                    self._remove_metric_series(metric, series, tracked_label_values)
+
+            if max_series is not None and max_series > 0:
+                while len(series) > max_series:
+                    tracked_label_values, _ = series.popitem(last=False)
+                    self._remove_metric_child(metric, tracked_label_values)
+            elif max_series is not None:
+                while series:
+                    tracked_label_values, _ = series.popitem(last=False)
+                    self._remove_metric_child(metric, tracked_label_values)
+
+    def _should_run_ttl_cleanup(
+        self,
+        metric_name: str,
+        now: float,
+        cleanup_interval_seconds: Optional[float],
+    ) -> bool:
+        if cleanup_interval_seconds is None or cleanup_interval_seconds <= 0:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+
+        last_cleanup = self._last_ttl_cleanup.get(metric_name)
+        if last_cleanup is None or now - last_cleanup >= cleanup_interval_seconds:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+        return False
+
+    def _remove_metric_series(
+        self,
+        metric: Any,
+        series: OrderedDict[tuple[Optional[str], ...], float],
+        label_values: tuple[Optional[str], ...],
+    ) -> None:
+        if label_values in series:
+            del series[label_values]
+        self._remove_metric_child(metric, label_values)
+
+    @staticmethod
+    def _remove_metric_child(
+        metric: Any, label_values: tuple[Optional[str], ...]
+    ) -> None:
+        try:
+            metric.remove(*label_values)
+        except (AttributeError, KeyError, ValueError):
+            pass

diff --git a/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
new file mode 100644
--- /dev/null
+++ b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
@@ -1,0 +1,156 @@
+from time import monotonic
+
+import pytest
+from prometheus_client import REGISTRY
+
+import litellm
+from litellm.integrations.prometheus import PrometheusLogger
+from litellm.integrations.prometheus_helpers import bounded_prometheus_series_tracker
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
+from litellm.types.integrations.prometheus import UserAPIKeyLabelValues
+
+
+@pytest.fixture(autouse=True)
+def cleanup_prometheus_registry():
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+    old_enable_end_user = litellm.enable_end_user_cost_tracking_prometheus_only
+    old_metrics_config = litellm.prometheus_metrics_config
+    old_max_series = litellm.prometheus_end_user_metrics_max_series_per_metric
+    old_ttl_seconds = litellm.prometheus_end_user_metrics_ttl_seconds
+    old_cleanup_interval_seconds = (
+        litellm.prometheus_end_user_metrics_cleanup_interval_seconds
+    )
+
+    yield
+
+    litellm.enable_end_user_cost_tracking_prometheus_only = old_enable_end_user
+    litellm.prometheus_metrics_config = old_metrics_config
+    litellm.prometheus_end_user_metrics_max_series_per_metric = old_max_series
+    litellm.prometheus_end_user_metrics_ttl_seconds = old_ttl_seconds
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = (
+        old_cleanup_interval_seconds
+    )
+
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+
+def test_prometheus_end_user_series_are_capped_per_metric():
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = 3
+    litellm.prometheus_end_user_metrics_ttl_seconds = None
+    logger = PrometheusLogger()
+
+    for index in range(6):
+        PrometheusLogger._inc_labeled_counter(
+            logger,
+            logger.litellm_spend_metric,
+            "litellm_spend_metric",
+            UserAPIKeyLabelValues(end_user=f"end-user-{index}"),
+            amount=0.01,
+        )
+
+    assert len(logger.litellm_spend_metric._metrics) == 3
+    assert set(logger.litellm_spend_metric._metrics) == {
+        ("end-user-3",),
+        ("end-user-4",),
+        ("end-user-5",),
+    }
+
+
+def test_bounded_prometheus_series_tracker_is_label_agnostic():
+    class FakeMetric:
+        def __init__(self):
+            self.removed_label_values = []
+
+        def remove(self, *label_values):
+            self.removed_label_values.append(label_values)
+
+    metric = FakeMetric()
+    tracker = BoundedPrometheusSeriesTracker()
+
+    for index in range(4):
+        tracker.track_series(
+            metric=metric,
+            metric_name="generic_metric",
+            label_values=(f"route-{index}", "200"),
+            max_series=2,
+            ttl_seconds=None,
+            cleanup_interval_seconds=60.0,
+        )
+
+    assert metric.removed_label_values == [
+        ("route-0", "200"),
+        ("route-1", "200"),
+    ]
+
+
+def test_prometheus_end_user_series_expire_by_ttl(monkeypatch):
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = None
+    litellm.prometheus_end_user_metrics_ttl_seconds = 10.0
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = 0.0
+    logger = PrometheusLogger()
+
+    current_time = [monotonic()]
+    monkeypatch.setattr(
+        bounded_prometheus_series_tracker.time,
+        "monotonic",
+        lambda: current_time[0],
+    )
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="stale-end-user"),
+        amount=0.01,
+    )
+
+    current_time[0] += 11.0
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="fresh-end-user"),
+        amount=0.01,
+    )
+
+    assert set(logger.litellm_spend_metric._metrics) == {("fresh-end-user",)}
+
+
+def test_prometheus_end_user_not_tracked_by_default():
+    litellm.enable_end_user_cost_tracking_prometheus_only = None
+    labels = PrometheusLogger().get_labels_for_metric("litellm_spend_metric")
+    assert "end_user" in labels
+
+    label_values = UserAPIKeyLabelValues(end_user="not-exported")
+    from litellm.integrations.prometheus import prometheus_label_factory
+
+    prometheus_labels = prometheus_label_factory(labels, label_values)
+    assert prometheus_labels["end_user"] is None

_{You can send follow-ups to the cloud agent here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Race condition between child creation and tracker eviction
- Acquired the tracker's RLock around both metric.labels() and track_series() in _get_labeled_metric so a concurrent track_series cannot evict the just-created child before the caller increments/observes it.
✅ Fixed: Tracker-Prometheus divergence on silent removal failure
- Made _remove_metric_child return whether removal actually succeeded and only delete the entry from the tracker's OrderedDict (in both max_series eviction and TTL cleanup paths) when it did, preventing the tracker from forgetting children that still exist in Prometheus.

Preview (bede81b2b9)

diff --git a/litellm/__init__.py b/litellm/__init__.py
--- a/litellm/__init__.py
+++ b/litellm/__init__.py
@@ -414,6 +414,9 @@
 custom_prometheus_tags: List[str] = []
 prometheus_metrics_config: Optional[List] = None
 prometheus_emit_stream_label: bool = False
+prometheus_end_user_metrics_max_series_per_metric: Optional[int] = 10000
+prometheus_end_user_metrics_ttl_seconds: Optional[float] = 3600.0
+prometheus_end_user_metrics_cleanup_interval_seconds: Optional[float] = 60.0
 disable_add_prefix_to_prompt: bool = (
     False  # used by anthropic, to disable adding prefix to prompt
 )

diff --git a/litellm/integrations/prometheus.py b/litellm/integrations/prometheus.py
--- a/litellm/integrations/prometheus.py
+++ b/litellm/integrations/prometheus.py
@@ -25,6 +25,9 @@
 import litellm
 from litellm._logging import print_verbose, verbose_logger
 from litellm.integrations.custom_logger import CustomLogger
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
 from litellm.integrations.prometheus_helpers import (
     PrometheusLabelFactoryContext,
     _get_cached_end_user_id_for_cost_tracking,
@@ -81,6 +84,7 @@
                 if _custom_buckets is not None
                 else LATENCY_BUCKETS
             )
+            self._bounded_prometheus_series_tracker = BoundedPrometheusSeriesTracker()
 
             # Create metric factory functions
             self._counter_factory = self._create_metric_factory(Counter)
@@ -984,6 +988,55 @@
 
         return filtered_labels
 
+    def _get_labeled_metric(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> Any:
+        with self._bounded_prometheus_series_tracker.lock:
+            labeled_metric = metric.labels(**labels)
+            self._track_bounded_prometheus_metric_series(metric, metric_name, labels)
+            return labeled_metric
+
+    def _track_bounded_prometheus_metric_series(
+        self,
+        metric: Any,
+        metric_name: DEFINED_PROMETHEUS_METRICS,
+        labels: Dict[str, Optional[str]],
+    ) -> None:
+        labelnames = self.get_labels_for_metric(metric_name)
+        if UserAPIKeyLabelNames.END_USER.value not in labelnames:
+            return
+
+        end_user = labels.get(UserAPIKeyLabelNames.END_USER.value)
+        if end_user is None:
+            return
+
+        max_series = getattr(
+            litellm, "prometheus_end_user_metrics_max_series_per_metric", 10000
+        )
+        ttl_seconds = getattr(
+            litellm, "prometheus_end_user_metrics_ttl_seconds", 3600.0
+        )
+        ttl_cleanup_interval_seconds = getattr(
+            litellm,
+            "prometheus_end_user_metrics_cleanup_interval_seconds",
+            60.0,
+        )
+        if max_series is None and ttl_seconds is None:
+            return
+
+        label_values = tuple(labels.get(label) for label in labelnames)
+        self._bounded_prometheus_series_tracker.track_series(
+            metric=metric,
+            metric_name=metric_name,
+            label_values=label_values,
+            max_series=max_series,
+            ttl_seconds=ttl_seconds,
+            cleanup_interval_seconds=ttl_cleanup_interval_seconds,
+        )
+
     def _inc_labeled_counter(
         self,
         counter: Any,
@@ -997,7 +1050,7 @@
             enum_values=enum_values,
             label_context=label_context,
         )
-        counter.labels(**_labels).inc(amount)
+        self._get_labeled_metric(counter, metric_name, _labels).inc(amount)
 
     async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
         # Define prometheus client
@@ -1468,8 +1521,10 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_time_to_first_token_metric.labels(
-                **_ttft_labels
+            self._get_labeled_metric(
+                self.litellm_llm_api_time_to_first_token_metric,
+                "litellm_llm_api_time_to_first_token_metric",
+                _ttft_labels,
             ).observe(time_to_first_token_seconds)
         else:
             verbose_logger.debug(
@@ -1488,9 +1543,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_llm_api_latency_metric.labels(**_labels).observe(
-                api_call_total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_llm_api_latency_metric,
+                "litellm_llm_api_latency_metric",
+                _labels,
+            ).observe(api_call_total_time_seconds)
 
         # total request latency
         total_time_seconds = self._safe_duration_seconds(
@@ -1505,9 +1562,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_total_latency_metric.labels(**_labels).observe(
-                total_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_total_latency_metric,
+                "litellm_request_total_latency_metric",
+                _labels,
+            ).observe(total_time_seconds)
 
         # request queue time (time from arrival to processing start)
         _litellm_params = kwargs.get("litellm_params", {}) or {}
@@ -1522,9 +1581,11 @@
                 enum_values=enum_values,
                 label_context=label_context,
             )
-            self.litellm_request_queue_time_metric.labels(**_labels).observe(
-                queue_time_seconds
-            )
+            self._get_labeled_metric(
+                self.litellm_request_queue_time_metric,
+                "litellm_request_queue_time_seconds",
+                _labels,
+            ).observe(queue_time_seconds)
 
     async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
         verbose_logger.debug(
@@ -1561,18 +1622,20 @@
         )
 
         try:
-            self.litellm_llm_api_failed_requests_metric.labels(
-                _sanitize_prometheus_label_value(end_user_id),
-                _sanitize_prometheus_label_value(user_api_key),
-                _sanitize_prometheus_label_value(user_api_key_alias),
-                _sanitize_prometheus_label_value(model),
-                _sanitize_prometheus_label_value(user_api_team),
-                _sanitize_prometheus_label_value(user_api_team_alias),
-                _sanitize_prometheus_label_value(user_id),
-                _sanitize_prometheus_label_value(
-                    standard_logging_payload.get("model_id", "")
+            self._inc_labeled_counter(
+                counter=self.litellm_llm_api_failed_requests_metric,
+                metric_name="litellm_llm_api_failed_requests_metric",
+                enum_values=UserAPIKeyLabelValues(
+                    end_user=end_user_id,
+                    hashed_api_key=user_api_key,
+                    api_key_alias=user_api_key_alias,
+                    model=model,
+                    team=user_api_team,
+                    team_alias=user_api_team_alias,
+                    user=user_id,
+                    model_id=standard_logging_payload.get("model_id", ""),
                 ),
-            ).inc()
+            )
             self.set_llm_deployment_failure_metrics(kwargs)
             await self._set_org_budget_metrics_after_api_request(
                 org_id=user_api_key_org_id,

diff --git a/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
new file mode 100644
--- /dev/null
+++ b/litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py
@@ -1,0 +1,111 @@
+from __future__ import annotations
+
+import time
+from collections import OrderedDict
+from threading import RLock
+from typing import Any, Dict, Optional
+
+
+class BoundedPrometheusSeriesTracker:
+    """
+    Tracks Prometheus child series and removes stale/excess labelsets.
+
+    The tracker is label-agnostic: callers decide which series should be tracked
+    and pass the full label tuple used by the Prometheus metric.
+    """
+
+    def __init__(self) -> None:
+        self._series: Dict[str, OrderedDict[tuple[Optional[str], ...], float]] = {}
+        self._last_ttl_cleanup: Dict[str, float] = {}
+        self.lock = RLock()
+
+    def track_series(
+        self,
+        metric: Any,
+        metric_name: str,
+        label_values: tuple[Optional[str], ...],
+        max_series: Optional[int],
+        ttl_seconds: Optional[float],
+        cleanup_interval_seconds: Optional[float],
+    ) -> None:
+        if max_series is None and ttl_seconds is None:
+            return
+
+        now = time.monotonic()
+
+        with self.lock:
+            series = self._series.setdefault(metric_name, OrderedDict())
+            series[label_values] = now
+            series.move_to_end(label_values)
+
+            if ttl_seconds is not None and self._should_run_ttl_cleanup(
+                metric_name=metric_name,
+                now=now,
+                cleanup_interval_seconds=cleanup_interval_seconds,
+            ):
+                expired_label_values = [
+                    tracked_label_values
+                    for tracked_label_values, last_seen in series.items()
+                    if now - last_seen > ttl_seconds
+                ]
+                for tracked_label_values in expired_label_values:
+                    self._remove_metric_series(metric, series, tracked_label_values)
+
+            if max_series is not None and max_series > 0:
+                while len(series) > max_series:
+                    tracked_label_values = next(iter(series))
+                    if not self._remove_metric_child(metric, tracked_label_values):
+                        break
+                    del series[tracked_label_values]
+            elif max_series is not None:
+                while series:
+                    tracked_label_values = next(iter(series))
+                    if not self._remove_metric_child(metric, tracked_label_values):
+                        break
+                    del series[tracked_label_values]
+
+    def _should_run_ttl_cleanup(
+        self,
+        metric_name: str,
+        now: float,
+        cleanup_interval_seconds: Optional[float],
+    ) -> bool:
+        if cleanup_interval_seconds is None or cleanup_interval_seconds <= 0:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+
+        last_cleanup = self._last_ttl_cleanup.get(metric_name)
+        if last_cleanup is None or now - last_cleanup >= cleanup_interval_seconds:
+            self._last_ttl_cleanup[metric_name] = now
+            return True
+        return False
+
+    def _remove_metric_series(
+        self,
+        metric: Any,
+        series: OrderedDict[tuple[Optional[str], ...], float],
+        label_values: tuple[Optional[str], ...],
+    ) -> None:
+        if self._remove_metric_child(metric, label_values):
+            series.pop(label_values, None)
+
+    @staticmethod
+    def _remove_metric_child(
+        metric: Any, label_values: tuple[Optional[str], ...]
+    ) -> bool:
+        """
+        Remove the Prometheus child for ``label_values`` and report whether the
+        tracker should commit the matching state change.
+
+        Returns ``True`` when the child is no longer present in Prometheus
+        (either it was just removed or it was already gone), and ``False`` when
+        ``metric.remove()`` raised an unexpected error and the child likely
+        still exists.
+        """
+        try:
+            metric.remove(*label_values)
+            return True
+        except KeyError:
+            return True
+        except (AttributeError, ValueError):
+            return False

diff --git a/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
new file mode 100644
--- /dev/null
+++ b/tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py
@@ -1,0 +1,156 @@
+from time import monotonic
+
+import pytest
+from prometheus_client import REGISTRY
+
+import litellm
+from litellm.integrations.prometheus import PrometheusLogger
+from litellm.integrations.prometheus_helpers import bounded_prometheus_series_tracker
+from litellm.integrations.prometheus_helpers.bounded_prometheus_series_tracker import (
+    BoundedPrometheusSeriesTracker,
+)
+from litellm.types.integrations.prometheus import UserAPIKeyLabelValues
+
+
+@pytest.fixture(autouse=True)
+def cleanup_prometheus_registry():
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+    old_enable_end_user = litellm.enable_end_user_cost_tracking_prometheus_only
+    old_metrics_config = litellm.prometheus_metrics_config
+    old_max_series = litellm.prometheus_end_user_metrics_max_series_per_metric
+    old_ttl_seconds = litellm.prometheus_end_user_metrics_ttl_seconds
+    old_cleanup_interval_seconds = (
+        litellm.prometheus_end_user_metrics_cleanup_interval_seconds
+    )
+
+    yield
+
+    litellm.enable_end_user_cost_tracking_prometheus_only = old_enable_end_user
+    litellm.prometheus_metrics_config = old_metrics_config
+    litellm.prometheus_end_user_metrics_max_series_per_metric = old_max_series
+    litellm.prometheus_end_user_metrics_ttl_seconds = old_ttl_seconds
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = (
+        old_cleanup_interval_seconds
+    )
+
+    collectors = list(REGISTRY._collector_to_names.keys())
+    for collector in collectors:
+        try:
+            REGISTRY.unregister(collector)
+        except Exception:
+            pass
+
+
+def test_prometheus_end_user_series_are_capped_per_metric():
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = 3
+    litellm.prometheus_end_user_metrics_ttl_seconds = None
+    logger = PrometheusLogger()
+
+    for index in range(6):
+        PrometheusLogger._inc_labeled_counter(
+            logger,
+            logger.litellm_spend_metric,
+            "litellm_spend_metric",
+            UserAPIKeyLabelValues(end_user=f"end-user-{index}"),
+            amount=0.01,
+        )
+
+    assert len(logger.litellm_spend_metric._metrics) == 3
+    assert set(logger.litellm_spend_metric._metrics) == {
+        ("end-user-3",),
+        ("end-user-4",),
+        ("end-user-5",),
+    }
+
+
+def test_bounded_prometheus_series_tracker_is_label_agnostic():
+    class FakeMetric:
+        def __init__(self):
+            self.removed_label_values = []
+
+        def remove(self, *label_values):
+            self.removed_label_values.append(label_values)
+
+    metric = FakeMetric()
+    tracker = BoundedPrometheusSeriesTracker()
+
+    for index in range(4):
+        tracker.track_series(
+            metric=metric,
+            metric_name="generic_metric",
+            label_values=(f"route-{index}", "200"),
+            max_series=2,
+            ttl_seconds=None,
+            cleanup_interval_seconds=60.0,
+        )
+
+    assert metric.removed_label_values == [
+        ("route-0", "200"),
+        ("route-1", "200"),
+    ]
+
+
+def test_prometheus_end_user_series_expire_by_ttl(monkeypatch):
+    litellm.enable_end_user_cost_tracking_prometheus_only = True
+    litellm.prometheus_metrics_config = [
+        {
+            "group": "end-user-spend",
+            "metrics": ["litellm_spend_metric"],
+            "include_labels": ["end_user"],
+        }
+    ]
+    litellm.prometheus_end_user_metrics_max_series_per_metric = None
+    litellm.prometheus_end_user_metrics_ttl_seconds = 10.0
+    litellm.prometheus_end_user_metrics_cleanup_interval_seconds = 0.0
+    logger = PrometheusLogger()
+
+    current_time = [monotonic()]
+    monkeypatch.setattr(
+        bounded_prometheus_series_tracker.time,
+        "monotonic",
+        lambda: current_time[0],
+    )
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="stale-end-user"),
+        amount=0.01,
+    )
+
+    current_time[0] += 11.0
+    PrometheusLogger._inc_labeled_counter(
+        logger,
+        logger.litellm_spend_metric,
+        "litellm_spend_metric",
+        UserAPIKeyLabelValues(end_user="fresh-end-user"),
+        amount=0.01,
+    )
+
+    assert set(logger.litellm_spend_metric._metrics) == {("fresh-end-user",)}
+
+
+def test_prometheus_end_user_not_tracked_by_default():
+    litellm.enable_end_user_cost_tracking_prometheus_only = None
+    labels = PrometheusLogger().get_labels_for_metric("litellm_spend_metric")
+    assert "end_user" in labels
+
+    label_values = UserAPIKeyLabelValues(end_user="not-exported")
+    from litellm.integrations.prometheus import prometheus_label_factory
+
+    prometheus_labels = prometheus_label_factory(labels, label_values)
+    assert prometheus_labels["end_user"] is None

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 0bfa9b7. Configure here.}

opencode/codex/litellm coverage: - anomalyco/opencode#25962 desktop utilityProcess split (man) - openai/codex#21287 skills watcher hoisted to app-server (man) - BerriAI/litellm#27272 bounded prometheus end-user series tracker (man) - BerriAI/litellm#27278 GCS extensionless URI MIME resolution (man)

greptile-apps · 2026-05-06T18:37:51Z

Greptile Summary

This PR introduces a BoundedPrometheusSeriesTracker helper that caps Prometheus child-series cardinality for metrics carrying an end_user label, replacing unbounded in-memory growth with LRU eviction and an optional TTL sweep.

Adds litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py with an OrderedDict-backed LRU + TTL eviction strategy; eviction uses a throttled cleanup interval to keep hot paths fast.
Wires the tracker into _inc_labeled_counter (counters) and into four histogram observe() call-sites (TTFT, LLM API latency, total request latency, queue time); three new litellm.* configuration knobs control the cap, TTL, and cleanup interval.

Confidence Score: 5/5

The change adds an opt-out cardinality cap with sensible defaults and all existing counters flow through the updated _inc_labeled_counter path; the four new histogram call-sites are consistent with their registration names and the new code has no backward-incompatible effect on callers that don't set end_user.

The LRU/TTL eviction logic is internally consistent: the current series is refreshed before TTL cleanup runs so it is never self-expired, the max_series <= 0 guard prevents silent eviction on a zero cap, AttributeError/ValueError from remove() are correctly handled without panicking, and the known trade-off of losing updates for an evicted series before the next scrape is explicitly documented. All previous-thread concerns are addressed in this revision and no new issues were found.

No files require special attention; the new helper and its integration points are straightforward and well-tested.

Important Files Changed

Filename	Overview
litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py	New helper: LRU + TTL eviction of Prometheus child series; logic is sound with `cleanup_interval_seconds=None/0` semantics and `KeyError`-swallowing for already-absent children handled correctly
litellm/integrations/prometheus.py	Wires tracker into `_inc_labeled_counter` and four histogram observe() sites; metric-name strings are consistent with registration; no new regressions introduced
litellm/init.py	Adds three new module-level config knobs with sensible defaults (10 000 series, 1 h TTL, 60 s cleanup interval)
tests/test_litellm/integrations/test_prometheus_end_user_cardinality.py	New test file covering cap, TTL expiry, zero-max-as-unlimited, label-agnostic behavior, and default disabled path; no real network calls

_{Reviews (2): Last reviewed commit: "perf: cap Prometheus end-user metric car..." | Re-trigger Greptile}

yassin-berriai · 2026-05-06T20:00:19Z

@greptileai

jimmychen-p72 · 2026-05-07T01:15:05Z

sanity check: there is no data loss or unexpected behavior on Datadog for example when the metrics map is capped? for example, if the cap results in tracking 100 users and metrics are observed for 150 users then will we only propagate 100 metrics for those 100 users?

ishaan-berri · 2026-05-07T01:17:59Z

@oss-agent-shin sanity check: there is no data loss or unexpected behavior on Datadog for example when the metrics map is capped? for example, if the cap results in tracking 100 users and metrics are observed for 150 users then will we only propagate 100 metrics for those 100 users?

cursor Bot reviewed May 6, 2026

View reviewed changes

Comment thread litellm/integrations/prometheus.py Outdated

cursor Bot reviewed May 6, 2026

View reviewed changes

Comment thread litellm/integrations/prometheus.py Outdated

Comment thread litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py Outdated

ishaan-berri marked this pull request as ready for review May 6, 2026 18:32

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

Comment thread litellm/integrations/prometheus.py Outdated

Comment thread litellm/integrations/prometheus.py Outdated

Comment thread litellm/integrations/prometheus_helpers/bounded_prometheus_series_tracker.py Outdated

ishaan-berri mentioned this pull request May 6, 2026

Fix Prometheus end-user metric cardinality tracking #27316

Open

7 tasks

yassin-berriai force-pushed the litellm_cap-prometheus-end-user-cardinality-b255 branch from bede81b to 66fcee1 Compare May 6, 2026 19:58

perf: cap Prometheus end-user metric cardinality with TTL + LRU eviction

962698e

yassin-berriai force-pushed the litellm_cap-prometheus-end-user-cardinality-b255 branch from 66fcee1 to 962698e Compare May 6, 2026 20:00

ishaan-berri enabled auto-merge (squash) May 6, 2026 20:08

yassin-berriai approved these changes May 6, 2026

View reviewed changes

ishaan-berri merged commit 487479e into litellm_internal_staging May 6, 2026
114 of 115 checks passed

ishaan-berri deleted the litellm_cap-prometheus-end-user-cardinality-b255 branch May 6, 2026 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cap Prometheus end-user metric cardinality#27272

Cap Prometheus end-user metric cardinality#27272
ishaan-berri merged 1 commit into
litellm_internal_stagingfrom
litellm_cap-prometheus-end-user-cardinality-b255

ishaan-berri commented May 6, 2026 •

edited by cursor Bot

Loading

CLAassistant commented May 6, 2026 •

edited

Loading

codecov Bot commented May 6, 2026

cursor Bot left a comment •

edited

Loading

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

greptile-apps Bot commented May 6, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

yassin-berriai commented May 6, 2026

Uh oh!

jimmychen-p72 commented May 7, 2026 •

edited

Loading

ishaan-berri commented May 7, 2026

Labels

5 participants

Uh oh!

Conversation

ishaan-berri commented May 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Repro

Evidence

Tests

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

CLAassistant commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

codecov Bot commented May 6, 2026

Codecov Report

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

yassin-berriai commented May 6, 2026

Uh oh!

jimmychen-p72 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ishaan-berri commented May 7, 2026

Labels

5 participants

ishaan-berri commented May 6, 2026 •

edited by cursor Bot

Loading

CLAassistant commented May 6, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading

cursor Bot left a comment •

edited

Loading

greptile-apps Bot commented May 6, 2026 •

edited

Loading

jimmychen-p72 commented May 7, 2026 •

edited

Loading