LLMStack Demo — Interactive Control Plane (V5

GPU Utilization

62%

Target 70–85% for efficiency

Latency (p95)

410ms

Autoscaler holds p95 under 600ms

Monthly GPU Cost

$18,900

↓ from $30,000 baseline

Fleet Overview

Click a row for details

Model	Runtime	GPU	Replicas	Status	Actions

Recent Alerts

GPU Utilization by Node

Last 5 min

Latency (p50/p95/p99)

ms

Requests by Endpoint

share

Cost Breakdown (Fixed vs Variable)

Monthly

Savings Attribution

percentage

Models

Deploy Model

Model	Runtime	GPU	Replicas	Status	Actions

1 • Select Model

2 • Configure

3 • Review

4 • Deploy

Model Family Runtime Endpoint Name

GPU Utilization

Latency (p95)

Throughput

Monthly GPU Cost

Idle Shutdown Intelligent Routing Autoscaling

Spend by Team

Access Controls

SSO Provider RBAC Audit Logs

Data & Deployment