Changelog¶

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.0.23] – 2025-06-18¶

Added¶

Automatic dark mode: The HTML report now respects your browser’s prefers-color-scheme setting and will automatically switch to a dark theme when your system is in dark mode.
Expanded Storage Checks: Introduced a comprehensive set of new checks to enhance Kubernetes storage monitoring and optimization:
- PV001: Orphaned Persistent Volumes: Detects Persistent Volumes not bound to any Persistent Volume Claim, helping to reclaim unused storage.
- PVC002: PVCs Using Default StorageClass: Flags PVCs that implicitly rely on a default storageClassName, encouraging explicit configuration for better clarity and portability.
- PVC003: ReadWriteMany PVCs on Incompatible Storage: Warns about PVCs requesting ReadWriteMany access mode when the underlying storage is typically block-based and doesn't support concurrent writes from multiple nodes, preventing potential data corruption.
- PVC004: Unbound Persistent Volume Claims: Flags PVCs stuck in a Pending phase, often indicating issues with the StorageClass, available PVs, or the storage provisioner.
- SC001: Deprecated StorageClass Provisioners: Identifies StorageClasses using deprecated or legacy in-tree provisioners, recommending migration to CSI drivers for future compatibility.
- SC002: StorageClass Prevents Volume Expansion: Detects StorageClasses that do not allow volume expansion, which can limit dynamic scaling of stateful applications.
- SC003: High Cluster Storage Usage: Monitors the overall percentage of used storage across the cluster, alerting when usage exceeds predefined thresholds (80%). Uses Prometheus.
Expanded Networking Checks: Added several new checks to identify common misconfigurations and security risks in Kubernetes networking:
- NET005: Ingress Host/Path Conflicts: Detects Ingress resources with overlapping host and path combinations, which can lead to unpredictable traffic routing.
- NET006: Ingress Using Wildcard Hosts: Flags Ingress resources using wildcard hostnames (*.example.com), which may provide broader access than intended and should be reviewed.
- NET007: Service TargetPort Mismatch: Identifies Services where the targetPort does not match any containerPort in the backing pods, preventing effective traffic delivery.
- NET008: ExternalName Service to Internal IP: Highlights ExternalName type Services configured to point to private IP ranges, potentially indicating an unusual or misconfigured internal routing pattern.
- NET009: Overly Permissive Network Policy: Warns about NetworkPolicies that define policyTypes but lack specific rules (allowing all traffic for that type) or include overly broad ipBlock definitions like 0.0.0.0/0.
- NET010: Network Policy Overly Permissive IPBlock: Specifically identifies NetworkPolicies that utilize 0.0.0.0/0 in their ipBlock rules, granting unrestricted access which poses a significant security risk.
- NET011: Network Policy Missing PolicyTypes: Flags NetworkPolicies that do not explicitly define policyTypes, improving clarity and ensuring consistent behavior across different Kubernetes versions and CNI plugins.
- NET012: Pod HostNetwork Usage: Identifies pods configured with hostNetwork: true, which allows direct access to the node's network interfaces, bypassing Kubernetes network isolation and potentially increasing security risk.
Pod Density per Node check (NODE003):
- Calculates pod density as (running pods ÷ max‑pods capacity) × 100.
- Alerts when percentage crosses warning (80% default) or critical (90% default) thresholds.
Workload Label Consistency Check (WRK009):
- Ensures that Deployment selectors match the labels on their Pod templates and that Services targeting those Deployments use consistent label selectors.
- Helps catch silent routing issues or monitoring mismatches caused by label typos or misalignment.
- Applies to Deployments and their associated Pods and Services.

[0.0.22] – 2025-06-04¶

Added¶

AI Recommendations with PSAI Integration KubeBuddy now supports AI-powered recommendations, leveraging OpenAI's ChatGPT via the excellent PSAI module by @dfinke:
When checks return findings, KubeBuddy automatically prompts GPT to generate:
- A short plain-text summary of recommended actions
- A detailed HTML block with actionable advice and documentation links
- These recommendations are:
- Embedded in the HTML report as collapsible "Recommended Actions" cards (with AI Enhanced labels)
- Shown in the text report with a clear prefix: AI Generated Recommendation:
- Included in the JSON output under the Recommendation object, with .text, .html, and .source fields (source = "AI")
- Graceful fallback: if no OpenAIKey is set or the AI call fails, checks fall back to static/manual recommendations or omit the section entirely

[0.0.21] – 2025-05-29¶

Fixes¶

Fixed an issue where the code would not actually run the checks.

[0.0.20] – 2025-05-29¶

What’s New¶

Prometheus integration We’ve wired KubeBuddy up to Prometheus so you can get real-time node and API-server metrics:
CPU & Memory Usage (PROM001 & PROM002): track average usage across all nodes over the last 24 hours.
Memory Saturation (PROM003): see how much of each node’s allocatable memory is actually in use.
API Server Latency (PROM004): alert you if request latency spikes beyond healthy thresholds.
CPU Overcommitment (PROM005): flag any nodes whose pods are asking for more CPU than they can deliver.
New per-node Prometheus view: click into any node’s card to see its individual metrics and time-series charts right in your report.
Plus new KubeData settings (URL, mode, credentials, headers, etc.) to configure your Prometheus connection securely.
Top 5 Impprovements: The Overview page now surfaces the five checks whose remediation yields the greatest cluster-health score gain, showing estimated points gain per issue.
“Hero” Issue-Summary cards Right at the top of your HTML report you’ll now see a row of big, color-coded cards showing how many checks failed at each severity level (Critical, Warning, Info).
Click a card and it smoothly expands inline to list every failing check in that category.
Built entirely with our new .hero-metrics, .metric-card, .expand-content and .scrollable-content CSS, plus a tiny toggleExpand() script for the show-and-hide behaviour.

Improvements¶

HTML report polish
Hover over any check header to see a handy info-icon tooltip with the full description.
Long “Findings” and “Recommendations” sections are now tucked into collapsible panels to keep your report neat.
Each recommendation is wrapped in a stylish card with a banner and auto-linked “Docs:” reference.
All tables live inside a <div class="table-container"> and are built by hand to ensure proper HTML-escaping and XSS safety.
Under-the-hood tweaks for Prometheus checks
All Prometheus parameters (Url, Mode, Username, etc.) are now predeclared so they work correctly inside PowerShell’s parallel runspaces.
Threshold lookups in parallel blocks now use $using:thresholds.
If you haven’t set a Prometheus URL or headers, KubeBuddy will quietly skip those checks (no noisy errors).

Fixes¶

Null-value errors eliminated by:
Checking that each threshold key actually exists before casting.
Verifying your PrometheusHeaders hashtable isn’t null or empty before poking its keys or making HTTP calls.

[0.0.19] - 2025-05-02¶

Fixed¶

Check Execution Bug Fixes:
Fixed an issue where certain security checks (e.g., SEC010) would not report results even when violations were present.

[0.0.18] - 2025-05-02¶

Added¶

New AKS Best Practice Checks:
Added AKSBP013: "No B-Series VMs in Node Pools" to ensure node pools do not use burstable B-series VMs, which can lead to inconsistent performance in production workloads (Severity: High).
Added AKSBP014: "Use v5 or Newer SKU VMs for Node Pools" to enforce the use of v5 or newer VM SKUs for better performance and reliability during updates (Severity: Medium).
Total checks now at 92 across all categories.

Changed¶

Updated Recommendations for All Checks:
Added links to relevant documentation in the recommendations for all checks across all categories (Best Practices, Disaster Recovery, Identity & Access, Monitoring & Logging, Networking, Resource Management, and Security), providing actionable guidance for each check.
Replaced Cluster Health Score Donut with Passed/Failed Chip:
Removed the circular progress bar (donut) for the Cluster Health Score in the dashboard.
Replaced it with a chip-style element for "Passed / Failed Checks" (e.g., "45 / 92 Passed"), aligning with the existing chip design for consistency.
Updated Chip Color Logic in Dashboard:
Adjusted the pass rate thresholds for the "Passed / Failed Checks" chip to better reflect cluster health:
- Red (Critical): <48% pass rate (lowered from 50% to account for near-threshold states).
- Yellow (Warning): 48%–79% pass rate.
- Green (Healthy): ≥80% pass rate.
With the current pass rate of 48.91% (45/92), the chip now displays as yellow instead of red, aligning with the updated threshold.

Fixed¶

NET003 Check:
Fixed an issue with the AKSNET003 ("Web App Routing Enabled") check to ensure it correctly evaluates the configuration and reports accurate results.

Notes¶

HTML Report Update:
Improved the visual design of the HTML report for better readability and user experience, as part of ongoing enhancements to the reporting interface.

[0.0.17] - 2025-04-25¶

Added¶

Migrated to YAML-based Checks:
Replaced pure PowerShell checks with YAML-defined checks for better maintainability and scalability.
Each check now has a unique ID for easier identification and referencing in reports (e.g., AKSNET001, NS001).
New Alerts:
Added new YAML-based alerts to enhance cluster monitoring.
Custom Checks HTML Tab
Automatically gathers any YAML‑defined checks whose section names aren’t in the standard list (Nodes, Namespaces, Workloads, etc.) into a new “Custom Checks” tab. Only shows the tab if there’s at least one real <tr>…</tr> snippet.
Exclude Checks Support
You can now explicitly exclude checks by their ID using the ExcludedChecks parameter. Excluded checks are skipped during evaluation and omitted from reports.
Multi-Architecture Docker Container:
Updated the Dockerfile to support both linux/amd64 and linux/arm64 architectures using Docker Buildx.
Dynamically downloads architecture-specific kubectl and kubelogin binaries based on the target platform ($TARGETARCH).
Updated GitHub Action for Multi-Architecture Builds:
Modified the GitHub Action workflow to use Docker Buildx for building and pushing multi-architecture images (linux/amd64 and linux/arm64) to GHCR.
Added support for tagging and pushing a latest tag for multi-architecture images.

Changed¶

Updated HTML Report:
Replaced single-page layout with a tab-based interface for better structure and usability.
Improved visuals, section separation, and print/export support.
AKS Results in Text Report:
Updated Generate-K8sTextReport to properly capture and write AKS results to the text report, including detailed check results and the summary table ("Summary & Rating").
Ensured the AKS summary table is consistently included in the text report output.
Improved Check Processing:
Refactored Invoke-AKSBestPractices to return structured data for text reports, removing direct Write-ToReport calls and allowing the caller (Generate-K8sTextReport) to handle file writing.

Fixed¶

Text Report AKS Summary Table:
Fixed an issue where the AKS summary table was not appearing in the text report by ensuring the TextOutput property is correctly written to the file.
File Path Scoping in Write-ToReport:
Updated Write-ToReport to accept a file path parameter, ensuring proper scoping and avoiding reliance on a global $ReportFile variable.S

[0.0.16] - 2025-04-16¶

Fixed¶

CRD JSON Parsing Error: Fixed an issue when fetching Custom Resource Definitions (CRDs) where ConvertFrom-Json failed due to key casing conflicts (proxyUrl vs proxyURL). CRDs are now parsed using -AsHashtable to avoid this conflict and allow consistent key access.
AKS Parameter Logic: Fixed incorrect AKS metadata fetch behavior. Previously, AKS metadata was fetched even if the -AKS switch was not passed. Now the call only runs when -AKS is explicitly set.

[0.0.15] - 2025-04-14¶

Added¶

Docker Container Support for KubeBuddy:
Created a multi-stage Dockerfile to build the KubeBuddy container image:
- Build stage: Uses mcr.microsoft.com/powershell:7.5-Ubuntu-22.04 for reliable setup of kubectl, powershell-yaml, Azure CLI, and the KubeBuddy module.
- Runtime stage: Uses mcr.microsoft.com/powershell:7.5-Ubuntu-22.04 to avoid dependency issues and ensure compatibility with the Azure CLI and kubeconfig setups.
Added adduser and coreutils to the build stage for file operations and permissions setup.
Added support for passing Azure SPN details and kubeconfig via environment variables and volume mounts, allowing for a smoother integration with AKS and other Kubernetes clusters.
Support for an optional thresholds YAML file: The file can be mounted at /home/kubeuser/.kube/kubebuddy-config.yaml (equivalent to $HOME/.kube/kubebuddy-config.yaml for the container user). This file allows customizing thresholds for alerts (e.g., CPU usage, pod age).
Created the /app/Reports directory during the build process (rather than copying from the host) to ensure a clean, fresh output directory for reports.
Copied KubeBuddy module files (KubeBuddy.psm1, KubeBuddy.psd1, Private, and Public) from the Git repository to /usr/local/share/powershell/Modules/KubeBuddy/, preserving module structure.
Ensured reports are accessible by mounting /app/Reports to a local volume for clean report generation.
AKS-Specific Checks:
Added a check to see if Vertical Pod Autoscaler (VPA) is enabled, as it is now part of Azure Advisor recommendations.
Kubernetes checks
Introduced new RBAC checks:
- Check-RBACMisconfigurations: Detects missing roleRef in RoleBindings and ClusterRoleBindings.
- Check-RBACOverexposure: Flags ServiceAccounts with excessive permissions like cluster-admin or wildcard access, and identifies roles with dangerous verbs (e.g., create, update, delete).
- Check-OrphanedRoles: Flags RoleBindings/ClusterRoleBindings with no subjects and Roles/ClusterRoles with no rules.
Added Severity and Recommendation columns to RBAC check outputs to provide actionable insights and prioritize findings.

Fixed¶

AKS Results: Fixed URL to be a clickable link in the AKS results.
ServiceAccount Detection: Corrected handling of the namespace field in RoleBinding and ClusterRoleBinding subjects within Check-RBACMisconfigurations.
Azure CLI Compatibility: Fixed Azure CLI installation by switching to Ubuntu 22.04, ensuring compatibility with the Azure CLI and its dependencies.
Validation Logic in run.ps1:
Corrected AKS mode validation to ensure $ClusterName, $ResourceGroup, and $SubscriptionId are only required when AKS mode is enabled.
Fixed validation check logic by adding parentheses to group conditions properly.
Updated $Aks to default to $false unless AKS_MODE is explicitly set to "true".

[0.0.14] - 2025-04-10¶

Added¶

Added cluster health checks and scoring:
Pod health evaluation based on Running and Ready conditions.
Node health assessment using Ready condition status.
Resource utilization scoring from kubectl top nodes data.
Comprehensive health report with total score and detected issues.
Added event analysis for cluster health:
Analyzes Kubernetes events to identify critical errors and warnings.
Reports significant issues (e.g., pod failures, scheduling issues) in the health summary.
Improved cluster validation:
Introduced robust validation for kubectl availability and connectivity to the current Kubernetes context.
Added AKS connectivity checks using az aks show, ensuring the cluster exists and the user is authenticated.
Enhanced error handling:
Clearer user feedback on failed or unauthorized cluster access with user-friendly Write-Host messages instead of raw exceptions.
Fail-fast logic now halts script execution gracefully if core checks fail.
New Get-KubeData logic:
Now verifies communication with the Kubernetes API server before fetching resources.
Graceful fallback if kubectl is present but cluster access is misconfigured.
Added support for silent script termination without full exception stack traces using Write-Host and return.

Changed¶

Replaced all direct throw calls in nested modules with friendly error messages and early exit patterns to improve UX.
Reorganized cluster validation into a single pre-check block within Get-KubeData for clarity and maintainability.

Fixed¶

Fixed inconsistent behavior where failed parallel resource fetches did not always halt script execution as expected.
Corrected exit behavior from AKS metadata fetch section to avoid crashing on partial failure.
Fixed Check-IngressHealth function to reliably detect and report ingress issues:
Corrected ingress fetching logic to work consistently with or without pre-fetched KubeData.
Added checks for missing ingress class, TLS secret validation, duplicate host/path detection, and invalid path types, beyond just service existence.

[0.0.13] - 2025-04-08¶

Added¶

11 new checks added to the JSON and HTML reports:
Resource configuration:
- Check-ResourceQuotas
- Check-NamespaceLimitRanges
- Check-MissingResourceLimits
- Check-HPAStatus
- Check-PodDisruptionBudgets
- Check-MissingHealthProbes
Workload health:
- Check-DeploymentIssues
- Check-StatefulSetIssues
Networking
- Check-IngressHealth
RBAC and identity:
- Check-OrphanedRoles
- Check-OrphanedServiceAccounts
HTML report now includes collapsible recommendations for all checks
Ingress health check detects references to missing backend services
New logic in the HTML report to add pagination when needed

Changed¶

Check-OrphanedRoles filtering updated to properly exclude namespaces during binding resolution
JSON report mode now uses $KubeData cache to speed up execution by avoiding duplicate kubectl calls
HTML report section order and navigation updated to include new categories and findings

Fixed¶

Fixed logic for HTML checks showing no findings — now prints the ✅ message consistently
Corrected orphaned role detection to handle exclusion before usage analysis

[0.0.12] - 2025-03-30¶

Added¶

Major performance improvement: report generation is now significantly faster due to parallelised kubectl resource fetching in Get-KubeData. This applies to HTML, text, and new JSON reports only, not interactive checks.
Added support for -Json output across key functions and checks, enabling structured machine-readable exports.
New -Yes parameter added to bypass interactive prompts in non-interactive or CI contexts.
Improved HTML report with optional hiding of ✅ sections when no issues are found.

Fixed¶

Fixed incorrect exclusion of stuck jobs due to filtering logic.

Changed¶

Error output during resource fetch and report generation is now cleaner and more informative.

[0.0.11] - 2025-03-28¶

Fixed¶

Table output now displays correctly when pagination is enabled.

[0.0.10] - 2025-03-26¶

Added¶

Added Check-PodsRunningAsRoot to identify pods that run with UID 0 or no runAsUser set.
Added Check-PrivilegedContainers to detect containers running with privileged: true.
Added Check-HostPidAndNetwork to find pods using hostPID or hostNetwork.
Added Check-RBACOverexposure to flag direct or indirect access to cluster-admin privileges, including wildcard permissions via custom roles.
Added -ExcludeNamespaces switch to most checks and report generators.
Automatically uses custom list from kubebuddy-config.yaml if present.
Falls back to default list of common system namespaces.
Integrated all the above checks into:
The RBAC & Security interactive menu
The HTML report with collapsible sections
The floating sidebar navigation (TOC)
Added contextual tooltips to HTML report headers for better inline explanation of metrics and checks.

Fixed¶

Quitting from sub menus does not kill the PowerShell session now.

Changed¶

Updated Show-RBACMenu to include the new security checks as menu options.
Updated HTML report to include additional security findings in the Security section.

[0.0.9] - 2025-03-20¶

Added¶

Added support for specifying custom report filenames with -OutputPath, allowing users to save reports with specific names instead of the default timestamped filename.
Reports now automatically include timestamps (YYYYMMDD-HHMMSS) when saved in a directory, preventing accidental overwrites.
The documentation has been updated to reflect these changes.

Fixed¶

Improved cross-platform path handling for PowerShell scripts, ensuring compatibility with both Windows and Linux file structures.
Ensured that directories are created correctly when specifying an output path.

[0.0.8] - 2025-03-20¶

Fixed¶

Fixed an issue with where we were importing modules twice.

[0.0.7] - 2025-03-20¶

Fixed¶

Fixed an issue with folder case to allow linux to import the correct modules.

[0.0.6] - 2025-03-19¶

Fixed¶

Fixed issue where $moduleVersion was not being correctly updated in the kubebuddy.ps1 script when setting the version dynamically.
Corrected the PowerShell script logic to handle version updates reliably using $tagVersion.
Resolved an error where the replace operation in the script failed due to incorrect concatenation of the $tagVersion variable.

[0.0.5] - 2025-03-19¶

Added¶

AKS best practices check with -aks, -SubscriptionId, -ResourceGroup, and -ClusterName, performing 34 different configuration and security checks tailored for Azure Kubernetes Service.

[0.0.4] - 2025-03-12¶

Added¶

Added new logo to html report.

[0.0.3] - 2025-03-06¶

Added¶

Initial release of KubeBuddy, providing snapshot-based monitoring, resource usage insights, and health checks for Kubernetes clusters.