Rule of Thumb for GPU KV Cache Utilization:
According to NVIDIA NIM Operator documentationarrow-up-right, a threshold of 50% is recommended as a default for scaling triggers.
0% - 30%
Low usage - resources are under-utilized
30% - 70%
Medium usage - healthy range under normal load
70% - 90%+
High usage - risk of cache eviction or inference slowdowns
Last updated 4 months ago