Monitoring Your Cluster After Enablement

After successfully setting up Cluster Orchestrator, you gain access to monitoring screens and dashboards that provide real-time insights into your cluster's performance, cost, and optimization opportunities. These dashboards are designed to help you track the effectiveness of your optimization settings and make data-driven decisions about your infrastructure.

Cluster Orchestrator provides four specialized views, each focusing on different aspects of your cluster:

1. Overview

This is your central page for monitoring overall cluster health, performance, and cost metrics. The dashboard is divided into several key sections:

Cluster Spend: Track your total cluster costs over time, with breakdowns by instance type and spot vs. on-demand usage
Cluster Details: View essential information about your cluster including name, region, and identifier.
Nodes Breakdown: Visualize your node distribution by fulfillment method (spot vs. on-demand)
CPU Breakdown: Track CPU allocation, usage, and available capacity across your cluster
Memory Breakdown: Monitor memory allocation, usage, and available capacity
Pod Distribution: See how many pods are there as spot, on-demand scheduled and unsecheduled

2. Workloads screen

This view focuses on the applications running in your cluster, helping you identify optimization opportunities at the workload level:

Namespace Organization: View workloads grouped by namespace for logical organization
Replica Count: Track the number of replicas for each workload

3. Nodes screen

This view provides insights into your cluster's infrastructure. The table displays the following information for each node:

Column	Description
Node Name	The full hostname of the node
Workloads	Number of workloads running on the node (e.g., 11)
Instance Type	The AWS instance type (e.g., m5.2xlarge)
Fulfillment	Whether the node is running as spot or on-demand
CPU	Current CPU usage and total capacity (e.g., 7.91/8)
Memory (GiB)	Current memory usage and total capacity (e.g., 29.92/30.89)
Age	How long the node has been running (e.g., 2h)
Status	Current node status (e.g., Ready or not )

4. Logs

The Logs screen provides a chronological record of all cluster events and optimization activities. You can filter events by time period (e.g., Last 90 days) and event type.

Column	Description
Logged On	Timestamp when the event occurred (e.g., Apr 23, 2025 5:11 PM)
Event Type	Category of the event (see types below)
Event Details	Description of what happened

Event Types

The logs track several types of events that help you understand cluster activity:

NodeProvisioned: When new instances are launched (e.g., "Instance i-0138fc81dc9c82e31 launched")
ScaleUpPlan: When the system generates plans to scale up resources (e.g., "Generated scale-up plans for 5 launch templates")
EvictPodsFromCandidateNode: When pods are evicted as part of optimization activities
Fallback: When workloads transition from spot to on-demand instances due to unavailability
ReverseFallback: When workloads transition back from on-demand to spot instances
InstanceReplacementDueToInterruption: When instances are replaced due to spot interruptions
SpotInterruptionNotice: When AWS sends a notification that a spot instance will be interrupted

1. Overview​

2. Workloads screen​

3. Nodes screen​

4. Logs​

Event Types​

1. Overview

2. Workloads screen

3. Nodes screen

4. Logs

Event Types