Metric: GPU KV Cache Usage
Rule of Thumb for GPU KV Cache Utilization:
According to NVIDIA NIM Operator documentation, a threshold of 50% is recommended as a default for scaling triggers.
Utilixation Range
Description
0% - 30%
Low usage - resources are under-utilized
30% - 70%
Medium usage - healthy range under normal load
70% - 90%+
High usage - risk of cache eviction or inference slowdowns
Last updated