Creating Checks¶
KubeBuddy is a Kubernetes auditing and monitoring tool that helps identify misconfigurations, performance bottlenecks, and potential risks in your cluster.
Checks are defined in YAML and evaluated by the Invoke-yamlChecks
engine. Results can be rendered in HTML, text, or JSON reports.
π¦ Check Types¶
You can author three kinds of checks:
1. Script-Based (PowerShell)¶
Use when you need full procedural logic:
- Define a
Script:
block in PowerShell. - Receive
$KubeData
, plus$Namespace
and-ExcludeNamespaces
flags. - Return either:
- An array of PSCustomObjects, or
- A hashtable with
{ Items = <array>; IssueCount = <int> }
.
2. Declarative¶
Field-based checks for simple path/operator/value comparisons:
- Specify
Condition
,Operator
, andExpected
. - No scripting requiredβideal for image-tag, label, or simple field checks.
3. Prometheus (NEW!)¶
Query Prometheus directly, with built-in threshold support:
- Define a
Prometheus:
block with your PromQL. - Provide
Operator:
andExpected:
to compare time-series averages. - Honor your global defaults (e.g.
cpu_critical
) viaGet-KubeBuddyThresholds
. - Control the look-back window via
Range.Duration
(supportsm
,h
,d
).
π§Ύ YAML Field Reference¶
Field | Type | Required | Applies to | Description |
---|---|---|---|---|
ID |
String | β | All | Unique identifier (e.g. POD001 , PROM003 ) |
Name |
String | β | All | Human-readable name |
Category |
String | β | All | Broad grouping (e.g. Security , Performance ) |
Section |
String | β | All | Sub-group for report navigation (e.g. Pods , Nodes ) |
ResourceKind |
String | β | All | Kubernetes kind (e.g. Pod , Node ) |
Severity |
String | β | All | Low , Medium , High , Warning , etc. |
Weight |
Integer | β | All | Sorting/priority weight |
Description |
String | β | All | What the check detects |
FailMessage |
String | β | All | Message to show when the check finds issues |
URL |
String | β | All | Link to related docs |
SpeechBubble |
List[String] | β | All | CLI-friendly messages |
Declarative only | ||||
Condition |
String | β β | Declarative | JSON path, supports []. arrays (e.g. spec.containers[].image ) |
Operator |
String | β β | Declarative | equals , contains , greater_than , etc. |
Expected |
String/Number | β β | Declarative | Value to compare against |
Script-Based only | ||||
Script |
PowerShell | β β‘ | Script-Based | Inline PowerShell script block |
Prometheus only | ||||
Prometheus.Query |
String | β Β§ | Prometheus | PromQL query (range or instant) |
Prometheus.Range.Step |
String | β Β§ | Prometheus | Range-vector step (e.g. 5m ) |
Prometheus.Range.Duration |
String | β Β§ | Prometheus | Look-back window (e.g. 30m , 24h , 2d ) |
Operator |
String | β Β§ | Prometheus | How to compare average (e.g. greater_than ) |
Expected |
String/Number | β Β§ | Prometheus | Threshold value or threshold-name (e.g. cpu_critical or 0.8 ) |
β Declarative only
β‘ Script-Based only
Β§ Prometheus only
π¬ Prometheus Check Example¶
checks:
- ID: "PROM001"
Name: "High CPU Pods (Prometheus)"
Category: "Performance"
Section: "Pods"
ResourceKind: "Pod"
Severity: "Warning"
Weight: 3
Description: "Checks for pods with sustained high CPU usage over the last 24 hours."
FailMessage: "Some pods show high sustained CPU usage."
URL: "https://kubernetes.io/docs/concepts/cluster-administration/monitoring/"
SpeechBubble:
- "π€ High CPU usage detected via Prometheus!"
- "β οΈ Might indicate a misbehaving app."
Recommendation:
text: "Investigate high-CPU pods; adjust limits or optimize workloads."
html: |
<div class="recommendation-content">
<h4>π οΈ Investigate High CPU Pods</h4>
<ul>
<li>Use <code>kubectl top pod</code> for live CPU stats.</li>
<li>Review app code or HPA settings.</li>
<li>Consider raising CPU requests/limits or scaling out.</li>
</ul>
</div>
Prometheus:
Query: 'sum(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (pod)'
Range:
Step: "5m"
Duration: "24h"
Operator: "greater_than"
Expected: "cpu_critical"
βοΈ Script-Based Example¶
checks:
- ID: "POD005"
Name: "CrashLoopBackOff Pods"
Category: "Workloads"
Section: "Pods"
ResourceKind: "Pod"
Severity: "Error"
Weight: 4
Description: "Identifies pods stuck in CrashLoopBackOff due to repeated crashes."
FailMessage: "Some pods are stuck restarting in CrashLoopBackOff."
URL: "https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy"
SpeechBubble:
- "π₯ Pods in CrashLoopBackOff!"
- "π Investigate container errors."
Recommendation:
text: "Check logs and fix misconfigurations."
html: |
<div class="recommendation-content">
<ul>
<li><code>kubectl logs <pod> -n <ns></code></li>
<li><code>kubectl describe pod <pod> -n <ns></code></li>
</ul>
</div>
Script: |
param([object]$KubeData, $Namespace, [switch]$ExcludeNamespaces)
$pods = if ($KubeData?.Pods) { $KubeData.Pods.items } else { (kubectl get pods -A -o json | ConvertFrom-Json).items }
if ($ExcludeNamespaces) { $pods = Exclude-Namespaces -items $pods }
$pods |
Where-Object {
$_.status.containerStatuses |
Where-Object { $_.state.waiting.reason -eq "CrashLoopBackOff" }
} |
ForEach-Object {
[PSCustomObject]@{
Namespace = $_.metadata.namespace
Pod = $_.metadata.name
Restarts = ($_.status.containerStatuses | Measure-Object -Property restartCount -Sum).Sum
}
}
β Best Practices¶
- Use meaningful IDs (
POD001
,PROM002
, etc.) - Scope each check to one responsibility
- For Prometheus, prefer global threshold names (e.g.
cpu_critical
) or numeric literals - Store your YAML in
yamlChecks/*.yaml
βno embedded JSON in PowerShell
π Folder Layout¶
yamlChecks/
βββ workloads.yaml
βββ security.yaml
βββ prometheus.yaml