Page cover

Metric: GPU KV Cache Usage

Rule of Thumb for GPU KV Cache Utilization:

Utilixation Range
Description

0% - 30%

Low usage - resources are under-utilized

30% - 70%

Medium usage - healthy range under normal load

70% - 90%+

High usage - risk of cache eviction or inference slowdowns

Last updated