> For the complete documentation index, see [llms.txt](https://docs.cloudeka.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.cloudeka.ai/deka-gpu/deka-gpu-autoscaling/keda-autoscalling/example-autoscaling-vllm-with-keda-based-on-gpu-kv-cache-usage/metric-gpu-kv-cache-usage.md). # Metric: GPU KV Cache Usage Rule of Thumb for GPU KV Cache Utilization: {% hint style="success" %} According to [NVIDIA NIM Operator documentation](https://docs.nvidia.com/nim-operator/latest/service.html), a threshold of 50% is recommended as a default for scaling triggers. {% endhint %}

Utilixation Range	Description
0% - 30%	Low usage - resources are under-utilized
30% - 70%	Medium usage - healthy range under normal load
70% - 90%+	High usage - risk of cache eviction or inference slowdowns