Skip to content

Creating Checks

KubeBuddy is a Kubernetes auditing and monitoring tool that helps identify misconfigurations, performance bottlenecks, and potential risks in your cluster.

Checks are defined in YAML and evaluated by the Invoke-yamlChecks engine. Results can be rendered in HTML, text, or JSON reports.

πŸ“¦ Check Types

You can author three kinds of checks:

1. Script-Based (PowerShell)

Use when you need full procedural logic:

  • Define a Script: block in PowerShell.
  • Receive $KubeData, plus $Namespace and -ExcludeNamespaces flags.
  • Return either:
  • An array of PSCustomObjects, or
  • A hashtable with { Items = <array>; IssueCount = <int> }.

2. Declarative

Field-based checks for simple path/operator/value comparisons:

  • Specify Condition, Operator, and Expected.
  • No scripting requiredβ€”ideal for image-tag, label, or simple field checks.

3. Prometheus (NEW!)

Query Prometheus directly, with built-in threshold support:

  • Define a Prometheus: block with your PromQL.
  • Provide Operator: and Expected: to compare time-series averages.
  • Honor your global defaults (e.g. cpu_critical) via Get-KubeBuddyThresholds.
  • Control the look-back window via Range.Duration (supports m,h,d).

🧾 YAML Field Reference

Field Type Required Applies to Description
ID String βœ… All Unique identifier (e.g. POD001, PROM003)
Name String βœ… All Human-readable name
Category String βœ… All Broad grouping (e.g. Security, Performance)
Section String βœ… All Sub-group for report navigation (e.g. Pods, Nodes)
ResourceKind String βœ… All Kubernetes kind (e.g. Pod, Node)
Severity String βœ… All Low, Medium, High, Warning, etc.
Weight Integer βœ… All Sorting/priority weight
Description String βœ… All What the check detects
FailMessage String βœ… All Message to show when the check finds issues
URL String βœ… All Link to related docs
SpeechBubble List[String] βœ… All CLI-friendly messages
Declarative only
Condition String βœ…β€  Declarative JSON path, supports []. arrays (e.g. spec.containers[].image)
Operator String βœ…β€  Declarative equals, contains, greater_than, etc.
Expected String/Number βœ…β€  Declarative Value to compare against
Script-Based only
Script PowerShell βœ…β€‘ Script-Based Inline PowerShell script block
Prometheus only
Prometheus.Query String βœ…Β§ Prometheus PromQL query (range or instant)
Prometheus.Range.Step String βœ…Β§ Prometheus Range-vector step (e.g. 5m)
Prometheus.Range.Duration String βœ…Β§ Prometheus Look-back window (e.g. 30m, 24h, 2d)
Operator String βœ…Β§ Prometheus How to compare average (e.g. greater_than)
Expected String/Number βœ…Β§ Prometheus Threshold value or threshold-name (e.g. cpu_critical or 0.8)

† Declarative only
‑ Script-Based only
Β§ Prometheus only

πŸ”¬ Prometheus Check Example

checks:
  - ID: "PROM001"
    Name: "High CPU Pods (Prometheus)"
    Category: "Performance"
    Section: "Pods"
    ResourceKind: "Pod"
    Severity: "Warning"
    Weight: 3
    Description: "Checks for pods with sustained high CPU usage over the last 24 hours."
    FailMessage: "Some pods show high sustained CPU usage."
    URL: "https://kubernetes.io/docs/concepts/cluster-administration/monitoring/"
    SpeechBubble:
      - "πŸ€– High CPU usage detected via Prometheus!"
      - "⚠️ Might indicate a misbehaving app."
    Recommendation:
      text: "Investigate high-CPU pods; adjust limits or optimize workloads."
      html: |
        <div class="recommendation-content">
          <h4>πŸ› οΈ Investigate High CPU Pods</h4>
          <ul>
            <li>Use <code>kubectl top pod</code> for live CPU stats.</li>
            <li>Review app code or HPA settings.</li>
            <li>Consider raising CPU requests/limits or scaling out.</li>
          </ul>
        </div>
    Prometheus:
      Query:  'sum(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (pod)'
      Range:
        Step:     "5m"
        Duration: "24h"
    Operator:   "greater_than"
    Expected:   "cpu_critical"

βš™οΈ Script-Based Example

checks:
  - ID: "POD005"
    Name: "CrashLoopBackOff Pods"
    Category: "Workloads"
    Section: "Pods"
    ResourceKind: "Pod"
    Severity: "Error"
    Weight: 4
    Description: "Identifies pods stuck in CrashLoopBackOff due to repeated crashes."
    FailMessage: "Some pods are stuck restarting in CrashLoopBackOff."
    URL: "https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy"
    SpeechBubble:
      - "πŸ’₯ Pods in CrashLoopBackOff!"
      - "πŸ” Investigate container errors."
    Recommendation:
      text: "Check logs and fix misconfigurations."
      html: |
        <div class="recommendation-content">
          <ul>
            <li><code>kubectl logs &lt;pod&gt; -n &lt;ns&gt;</code></li>
            <li><code>kubectl describe pod &lt;pod&gt; -n &lt;ns&gt;</code></li>
          </ul>
        </div>
    Script: |
      param([object]$KubeData, $Namespace, [switch]$ExcludeNamespaces)
      $pods = if ($KubeData?.Pods) { $KubeData.Pods.items } else { (kubectl get pods -A -o json | ConvertFrom-Json).items }
      if ($ExcludeNamespaces) { $pods = Exclude-Namespaces -items $pods }
      $pods |
        Where-Object {
          $_.status.containerStatuses |
          Where-Object { $_.state.waiting.reason -eq "CrashLoopBackOff" }
        } |
        ForEach-Object {
          [PSCustomObject]@{
            Namespace = $_.metadata.namespace
            Pod       = $_.metadata.name
            Restarts  = ($_.status.containerStatuses | Measure-Object -Property restartCount -Sum).Sum
          }
        }

βœ… Best Practices

  • Use meaningful IDs (POD001, PROM002, etc.)
  • Scope each check to one responsibility
  • For Prometheus, prefer global threshold names (e.g. cpu_critical) or numeric literals
  • Store your YAML in yamlChecks/*.yamlβ€”no embedded JSON in PowerShell

πŸ“‚ Folder Layout

yamlChecks/
β”œβ”€β”€ workloads.yaml
β”œβ”€β”€ security.yaml
└── prometheus.yaml

πŸ“š Resources