LLMStack
Deploy Model
GPU Utilization
62%
Target 70–85% for efficiency
Latency (p95)
410ms
Autoscaler holds p95 under 600ms
Monthly GPU Cost
$18,900
↓ from $30,000 baseline
Fleet Overview
Click a row for details
Model Runtime GPU Replicas Status Actions
Recent Alerts
  • Idle GPU detected on node g4dn‑2xlarge — shutdown scheduled.
  • Latency spike on /api/mixtral — scaling replicas 2 → 3.
GPU Utilization by Node
Last 5 min
Latency (p50/p95/p99)
ms
Requests by Endpoint
share
Cost Breakdown (Fixed vs Variable)
Monthly
Savings Attribution
percentage
Model Details
Metrics
Actions
vLLM GPU: 1 Replicas: 1
Logs