Creating Checks¶
KubeBuddy is a Kubernetes auditing and monitoring tool that helps identify misconfigurations, performance bottlenecks, and potential risks in your cluster.
Checks are defined in YAML and evaluated by the Invoke-yamlChecks engine. Results can be rendered in HTML, text, or JSON reports.
π¦ Check Types¶
You can author three kinds of checks:
1. Script-Based (PowerShell)¶
Use when you need full procedural logic:
- Define a
Script:block in PowerShell. - Receive
$KubeData, plus$Namespaceand-ExcludeNamespacesflags. - Return either:
- An array of PSCustomObjects, or
- A hashtable with
{ Items = <array>; IssueCount = <int> }.
2. Declarative¶
Field-based checks for simple path/operator/value comparisons:
- Specify
Condition,Operator, andExpected. - No scripting requiredβideal for image-tag, label, or simple field checks.
3. Prometheus (NEW!)¶
Query Prometheus directly, with built-in threshold support:
- Define a
Prometheus:block with your PromQL. - Provide
Operator:andExpected:to compare time-series averages. - Honor your global defaults (e.g.
cpu_critical) viaGet-KubeBuddyThresholds. - Control the look-back window via
Range.Duration(supportsm,h,d).
π§Ύ YAML Field Reference¶
| Field | Type | Required | Applies to | Description |
|---|---|---|---|---|
ID |
String | β | All | Unique identifier (e.g. POD001, PROM003) |
Name |
String | β | All | Human-readable name |
Category |
String | β | All | Broad grouping (e.g. Security, Performance) |
Section |
String | β | All | Sub-group for report navigation (e.g. Pods, Nodes) |
ResourceKind |
String | β | All | Kubernetes kind (e.g. Pod, Node) |
Severity |
String | β | All | Low, Medium, High, Warning, etc. |
Weight |
Integer | β | All | Sorting/priority weight |
Description |
String | β | All | What the check detects |
FailMessage |
String | β | All | Message to show when the check finds issues |
URL |
String | β | All | Link to related docs |
SpeechBubble |
List[String] | β | All | CLI-friendly messages |
| Declarative only | ||||
Condition |
String | β β | Declarative | JSON path, supports []. arrays (e.g. spec.containers[].image) |
Operator |
String | β β | Declarative | equals, contains, greater_than, etc. |
Expected |
String/Number | β β | Declarative | Value to compare against |
| Script-Based only | ||||
Script |
PowerShell | β β‘ | Script-Based | Inline PowerShell script block |
| Prometheus only | ||||
Prometheus.Query |
String | β Β§ | Prometheus | PromQL query (range or instant) |
Prometheus.Range.Step |
String | β Β§ | Prometheus | Range-vector step (e.g. 5m) |
Prometheus.Range.Duration |
String | β Β§ | Prometheus | Look-back window (e.g. 30m, 24h, 2d) |
Operator |
String | β Β§ | Prometheus | How to compare average (e.g. greater_than) |
Expected |
String/Number | β Β§ | Prometheus | Threshold value or threshold-name (e.g. cpu_critical or 0.8) |
β Declarative only
β‘ Script-Based only
Β§ Prometheus only
π¬ Prometheus Check Example¶
checks:
- ID: "PROM001"
Name: "High CPU Pods (Prometheus)"
Category: "Performance"
Section: "Pods"
ResourceKind: "Pod"
Severity: "Warning"
Weight: 3
Description: "Checks for pods with sustained high CPU usage over the last 24 hours."
FailMessage: "Some pods show high sustained CPU usage."
URL: "https://kubernetes.io/docs/concepts/cluster-administration/monitoring/"
SpeechBubble:
- "π€ High CPU usage detected via Prometheus!"
- "β οΈ Might indicate a misbehaving app."
Recommendation:
text: "Investigate high-CPU pods; adjust limits or optimize workloads."
html: |
<div class="recommendation-content">
<h4>π οΈ Investigate High CPU Pods</h4>
<ul>
<li>Use <code>kubectl top pod</code> for live CPU stats.</li>
<li>Review app code or HPA settings.</li>
<li>Consider raising CPU requests/limits or scaling out.</li>
</ul>
</div>
Prometheus:
Query: 'sum(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (pod)'
Range:
Step: "5m"
Duration: "24h"
Operator: "greater_than"
Expected: "cpu_critical"
βοΈ Script-Based Example¶
checks:
- ID: "POD005"
Name: "CrashLoopBackOff Pods"
Category: "Workloads"
Section: "Pods"
ResourceKind: "Pod"
Severity: "Error"
Weight: 4
Description: "Identifies pods stuck in CrashLoopBackOff due to repeated crashes."
FailMessage: "Some pods are stuck restarting in CrashLoopBackOff."
URL: "https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy"
SpeechBubble:
- "π₯ Pods in CrashLoopBackOff!"
- "π Investigate container errors."
Recommendation:
text: "Check logs and fix misconfigurations."
html: |
<div class="recommendation-content">
<ul>
<li><code>kubectl logs <pod> -n <ns></code></li>
<li><code>kubectl describe pod <pod> -n <ns></code></li>
</ul>
</div>
Script: |
param([object]$KubeData, $Namespace, [switch]$ExcludeNamespaces)
$pods = if ($KubeData?.Pods) { $KubeData.Pods.items } else { (kubectl get pods -A -o json | ConvertFrom-Json).items }
if ($ExcludeNamespaces) { $pods = Exclude-Namespaces -items $pods }
$pods |
Where-Object {
$_.status.containerStatuses |
Where-Object { $_.state.waiting.reason -eq "CrashLoopBackOff" }
} |
ForEach-Object {
[PSCustomObject]@{
Namespace = $_.metadata.namespace
Pod = $_.metadata.name
Restarts = ($_.status.containerStatuses | Measure-Object -Property restartCount -Sum).Sum
}
}
β Best Practices¶
- Use meaningful IDs (
POD001,PROM002, etc.) - Scope each check to one responsibility
- For Prometheus, prefer global threshold names (e.g.
cpu_critical) or numeric literals - Store your YAML in
yamlChecks/*.yamlβno embedded JSON in PowerShell
π Folder Layout¶
yamlChecks/
βββ workloads.yaml
βββ security.yaml
βββ prometheus.yaml