Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.0.23] – 2025-06-18¶
Added¶
-
Automatic dark mode: The HTML report now respects your browser’s
prefers-color-scheme
setting and will automatically switch to a dark theme when your system is in dark mode. -
Expanded Storage Checks: Introduced a comprehensive set of new checks to enhance Kubernetes storage monitoring and optimization:
- PV001: Orphaned Persistent Volumes: Detects Persistent Volumes not bound to any Persistent Volume Claim, helping to reclaim unused storage.
- PVC002: PVCs Using Default StorageClass: Flags PVCs that implicitly rely on a default
storageClassName
, encouraging explicit configuration for better clarity and portability. - PVC003: ReadWriteMany PVCs on Incompatible Storage: Warns about PVCs requesting
ReadWriteMany
access mode when the underlying storage is typically block-based and doesn't support concurrent writes from multiple nodes, preventing potential data corruption. - PVC004: Unbound Persistent Volume Claims: Flags PVCs stuck in a
Pending
phase, often indicating issues with the StorageClass, available PVs, or the storage provisioner. - SC001: Deprecated StorageClass Provisioners: Identifies StorageClasses using deprecated or legacy in-tree provisioners, recommending migration to CSI drivers for future compatibility.
- SC002: StorageClass Prevents Volume Expansion: Detects StorageClasses that do not allow volume expansion, which can limit dynamic scaling of stateful applications.
- SC003: High Cluster Storage Usage: Monitors the overall percentage of used storage across the cluster, alerting when usage exceeds predefined thresholds (80%). Uses Prometheus.
-
Expanded Networking Checks: Added several new checks to identify common misconfigurations and security risks in Kubernetes networking:
- NET005: Ingress Host/Path Conflicts: Detects Ingress resources with overlapping host and path combinations, which can lead to unpredictable traffic routing.
- NET006: Ingress Using Wildcard Hosts: Flags Ingress resources using wildcard hostnames (
*.example.com
), which may provide broader access than intended and should be reviewed. - NET007: Service TargetPort Mismatch: Identifies Services where the
targetPort
does not match anycontainerPort
in the backing pods, preventing effective traffic delivery. - NET008: ExternalName Service to Internal IP: Highlights
ExternalName
type Services configured to point to private IP ranges, potentially indicating an unusual or misconfigured internal routing pattern. - NET009: Overly Permissive Network Policy: Warns about NetworkPolicies that define
policyTypes
but lack specific rules (allowing all traffic for that type) or include overly broadipBlock
definitions like0.0.0.0/0
. - NET010: Network Policy Overly Permissive IPBlock: Specifically identifies NetworkPolicies that utilize
0.0.0.0/0
in theiripBlock
rules, granting unrestricted access which poses a significant security risk. - NET011: Network Policy Missing PolicyTypes: Flags NetworkPolicies that do not explicitly define
policyTypes
, improving clarity and ensuring consistent behavior across different Kubernetes versions and CNI plugins. - NET012: Pod HostNetwork Usage: Identifies pods configured with
hostNetwork: true
, which allows direct access to the node's network interfaces, bypassing Kubernetes network isolation and potentially increasing security risk.
-
Pod Density per Node check (NODE003):
- Calculates pod density as
(running pods ÷ max‑pods capacity) × 100
. - Alerts when percentage crosses warning (80% default) or critical (90% default) thresholds.
- Calculates pod density as
-
Workload Label Consistency Check (WRK009):
- Ensures that Deployment selectors match the labels on their Pod templates and that Services targeting those Deployments use consistent label selectors.
- Helps catch silent routing issues or monitoring mismatches caused by label typos or misalignment.
- Applies to Deployments and their associated Pods and Services.
[0.0.22] – 2025-06-04¶
Added¶
-
AI Recommendations with PSAI Integration KubeBuddy now supports AI-powered recommendations, leveraging OpenAI's ChatGPT via the excellent PSAI module by @dfinke:
-
When checks return findings, KubeBuddy automatically prompts GPT to generate:
- A short plain-text summary of recommended actions
- A detailed HTML block with actionable advice and documentation links
-
These recommendations are:
-
Embedded in the HTML report as collapsible "Recommended Actions" cards (with
AI Enhanced
labels) - Shown in the text report with a clear prefix:
AI Generated Recommendation:
- Included in the JSON output under the
Recommendation
object, with.text
,.html
, and.source
fields (source = "AI"
) - Graceful fallback: if no
OpenAIKey
is set or the AI call fails, checks fall back to static/manual recommendations or omit the section entirely
[0.0.21] – 2025-05-29¶
Fixes¶
- Fixed an issue where the code would not actually run the checks.
[0.0.20] – 2025-05-29¶
What’s New¶
-
Prometheus integration We’ve wired KubeBuddy up to Prometheus so you can get real-time node and API-server metrics:
-
CPU & Memory Usage (PROM001 & PROM002): track average usage across all nodes over the last 24 hours.
- Memory Saturation (PROM003): see how much of each node’s allocatable memory is actually in use.
- API Server Latency (PROM004): alert you if request latency spikes beyond healthy thresholds.
- CPU Overcommitment (PROM005): flag any nodes whose pods are asking for more CPU than they can deliver.
- New per-node Prometheus view: click into any node’s card to see its individual metrics and time-series charts right in your report.
-
Plus new
KubeData
settings (URL, mode, credentials, headers, etc.) to configure your Prometheus connection securely. -
Top 5 Impprovements: The Overview page now surfaces the five checks whose remediation yields the greatest cluster-health score gain, showing estimated points gain per issue.
-
“Hero” Issue-Summary cards Right at the top of your HTML report you’ll now see a row of big, color-coded cards showing how many checks failed at each severity level (Critical, Warning, Info).
-
Click a card and it smoothly expands inline to list every failing check in that category.
- Built entirely with our new
.hero-metrics
,.metric-card
,.expand-content
and.scrollable-content
CSS, plus a tinytoggleExpand()
script for the show-and-hide behaviour.
Improvements¶
-
HTML report polish
-
Hover over any check header to see a handy info-icon tooltip with the full description.
- Long “Findings” and “Recommendations” sections are now tucked into collapsible panels to keep your report neat.
- Each recommendation is wrapped in a stylish card with a banner and auto-linked “Docs:” reference.
-
All tables live inside a
<div class="table-container">
and are built by hand to ensure proper HTML-escaping and XSS safety. -
Under-the-hood tweaks for Prometheus checks
-
All Prometheus parameters (
Url
,Mode
,Username
, etc.) are now predeclared so they work correctly inside PowerShell’s parallel runspaces. - Threshold lookups in parallel blocks now use
$using:thresholds
. - If you haven’t set a Prometheus URL or headers, KubeBuddy will quietly skip those checks (no noisy errors).
Fixes¶
-
Null-value errors eliminated by:
-
Checking that each threshold key actually exists before casting.
- Verifying your
PrometheusHeaders
hashtable isn’t null or empty before poking its keys or making HTTP calls.
[0.0.19] - 2025-05-02¶
Fixed¶
-
Check Execution Bug Fixes:
-
Fixed an issue where certain security checks (e.g.,
SEC010
) would not report results even when violations were present.
[0.0.18] - 2025-05-02¶
Added¶
- New AKS Best Practice Checks:
- Added
AKSBP013
: "No B-Series VMs in Node Pools" to ensure node pools do not use burstable B-series VMs, which can lead to inconsistent performance in production workloads (Severity: High). - Added
AKSBP014
: "Use v5 or Newer SKU VMs for Node Pools" to enforce the use of v5 or newer VM SKUs for better performance and reliability during updates (Severity: Medium). - Total checks now at 92 across all categories.
Changed¶
- Updated Recommendations for All Checks:
- Added links to relevant documentation in the recommendations for all checks across all categories (Best Practices, Disaster Recovery, Identity & Access, Monitoring & Logging, Networking, Resource Management, and Security), providing actionable guidance for each check.
- Replaced Cluster Health Score Donut with Passed/Failed Chip:
- Removed the circular progress bar (donut) for the Cluster Health Score in the dashboard.
- Replaced it with a chip-style element for "Passed / Failed Checks" (e.g., "45 / 92 Passed"), aligning with the existing chip design for consistency.
- Updated Chip Color Logic in Dashboard:
- Adjusted the pass rate thresholds for the "Passed / Failed Checks" chip to better reflect cluster health:
- Red (Critical): <48% pass rate (lowered from 50% to account for near-threshold states).
- Yellow (Warning): 48%–79% pass rate.
- Green (Healthy): ≥80% pass rate.
- With the current pass rate of 48.91% (45/92), the chip now displays as yellow instead of red, aligning with the updated threshold.
Fixed¶
- NET003 Check:
- Fixed an issue with the
AKSNET003
("Web App Routing Enabled") check to ensure it correctly evaluates the configuration and reports accurate results.
Notes¶
- HTML Report Update:
- Improved the visual design of the HTML report for better readability and user experience, as part of ongoing enhancements to the reporting interface.
[0.0.17] - 2025-04-25¶
Added¶
- Migrated to YAML-based Checks:
- Replaced pure PowerShell checks with YAML-defined checks for better maintainability and scalability.
- Each check now has a unique
ID
for easier identification and referencing in reports (e.g.,AKSNET001
,NS001
). - New Alerts:
- Added new YAML-based alerts to enhance cluster monitoring.
- Custom Checks HTML Tab
Automatically gathers any YAML‑defined checks whose section names aren’t in the standard list (Nodes, Namespaces, Workloads, etc.) into a new “Custom Checks” tab. Only shows the tab if there’s at least one real<tr>…</tr>
snippet. - Exclude Checks Support
You can now explicitly exclude checks by their ID using theExcludedChecks
parameter. Excluded checks are skipped during evaluation and omitted from reports. - Multi-Architecture Docker Container:
- Updated the Dockerfile to support both
linux/amd64
andlinux/arm64
architectures using Docker Buildx. - Dynamically downloads architecture-specific
kubectl
andkubelogin
binaries based on the target platform ($TARGETARCH
). - Updated GitHub Action for Multi-Architecture Builds:
- Modified the GitHub Action workflow to use Docker Buildx for building and pushing multi-architecture images (
linux/amd64
andlinux/arm64
) to GHCR. - Added support for tagging and pushing a
latest
tag for multi-architecture images.
Changed¶
- Updated HTML Report:
- Replaced single-page layout with a tab-based interface for better structure and usability.
- Improved visuals, section separation, and print/export support.
- AKS Results in Text Report:
- Updated
Generate-K8sTextReport
to properly capture and write AKS results to the text report, including detailed check results and the summary table ("Summary & Rating"). - Ensured the AKS summary table is consistently included in the text report output.
- Improved Check Processing:
- Refactored
Invoke-AKSBestPractices
to return structured data for text reports, removing directWrite-ToReport
calls and allowing the caller (Generate-K8sTextReport
) to handle file writing.
Fixed¶
- Text Report AKS Summary Table:
- Fixed an issue where the AKS summary table was not appearing in the text report by ensuring the
TextOutput
property is correctly written to the file. - File Path Scoping in
Write-ToReport
: - Updated
Write-ToReport
to accept a file path parameter, ensuring proper scoping and avoiding reliance on a global$ReportFile
variable.S
[0.0.16] - 2025-04-16¶
Fixed¶
- CRD JSON Parsing Error: Fixed an issue when fetching Custom Resource Definitions (CRDs) where
ConvertFrom-Json
failed due to key casing conflicts (proxyUrl
vsproxyURL
). CRDs are now parsed using-AsHashtable
to avoid this conflict and allow consistent key access. - AKS Parameter Logic: Fixed incorrect AKS metadata fetch behavior. Previously, AKS metadata was fetched even if the
-AKS
switch was not passed. Now the call only runs when-AKS
is explicitly set.
[0.0.15] - 2025-04-14¶
Added¶
- Docker Container Support for KubeBuddy:
- Created a multi-stage Dockerfile to build the KubeBuddy container image:
- Build stage: Uses
mcr.microsoft.com/powershell:7.5-Ubuntu-22.04
for reliable setup ofkubectl
,powershell-yaml
,Azure CLI
, and theKubeBuddy
module. - Runtime stage: Uses
mcr.microsoft.com/powershell:7.5-Ubuntu-22.04
to avoid dependency issues and ensure compatibility with the Azure CLI and kubeconfig setups.
- Build stage: Uses
- Added
adduser
andcoreutils
to the build stage for file operations and permissions setup. - Added support for passing Azure SPN details and kubeconfig via environment variables and volume mounts, allowing for a smoother integration with AKS and other Kubernetes clusters.
- Support for an optional thresholds YAML file: The file can be mounted at
/home/kubeuser/.kube/kubebuddy-config.yaml
(equivalent to$HOME/.kube/kubebuddy-config.yaml
for the container user). This file allows customizing thresholds for alerts (e.g., CPU usage, pod age). - Created the
/app/Reports
directory during the build process (rather than copying from the host) to ensure a clean, fresh output directory for reports. - Copied KubeBuddy module files (
KubeBuddy.psm1
,KubeBuddy.psd1
,Private
, andPublic
) from the Git repository to/usr/local/share/powershell/Modules/KubeBuddy/
, preserving module structure. - Ensured reports are accessible by mounting
/app/Reports
to a local volume for clean report generation. - AKS-Specific Checks:
- Added a check to see if Vertical Pod Autoscaler (VPA) is enabled, as it is now part of Azure Advisor recommendations.
- Kubernetes checks
- Introduced new RBAC checks:
- Check-RBACMisconfigurations: Detects missing
roleRef
inRoleBindings
andClusterRoleBindings
. - Check-RBACOverexposure: Flags ServiceAccounts with excessive permissions like
cluster-admin
or wildcard access, and identifies roles with dangerous verbs (e.g.,create
,update
,delete
). - Check-OrphanedRoles: Flags
RoleBindings
/ClusterRoleBindings
with no subjects andRoles
/ClusterRoles
with no rules.
- Check-RBACMisconfigurations: Detects missing
- Added Severity and Recommendation columns to RBAC check outputs to provide actionable insights and prioritize findings.
Fixed¶
- AKS Results: Fixed URL to be a clickable link in the AKS results.
- ServiceAccount Detection: Corrected handling of the
namespace
field inRoleBinding
andClusterRoleBinding
subjects withinCheck-RBACMisconfigurations
. - Azure CLI Compatibility: Fixed Azure CLI installation by switching to Ubuntu 22.04, ensuring compatibility with the Azure CLI and its dependencies.
- Validation Logic in
run.ps1
: - Corrected AKS mode validation to ensure
$ClusterName
,$ResourceGroup
, and$SubscriptionId
are only required when AKS mode is enabled. - Fixed validation check logic by adding parentheses to group conditions properly.
- Updated
$Aks
to default to$false
unlessAKS_MODE
is explicitly set to"true"
.
[0.0.14] - 2025-04-10¶
Added¶
- Added cluster health checks and scoring:
- Pod health evaluation based on Running and Ready conditions.
- Node health assessment using Ready condition status.
- Resource utilization scoring from
kubectl top nodes
data. - Comprehensive health report with total score and detected issues.
- Added event analysis for cluster health:
- Analyzes Kubernetes events to identify critical errors and warnings.
- Reports significant issues (e.g., pod failures, scheduling issues) in the health summary.
- Improved cluster validation:
- Introduced robust validation for
kubectl
availability and connectivity to the current Kubernetes context. - Added AKS connectivity checks using
az aks show
, ensuring the cluster exists and the user is authenticated. - Enhanced error handling:
- Clearer user feedback on failed or unauthorized cluster access with user-friendly
Write-Host
messages instead of raw exceptions. - Fail-fast logic now halts script execution gracefully if core checks fail.
- New
Get-KubeData
logic: - Now verifies communication with the Kubernetes API server before fetching resources.
- Graceful fallback if kubectl is present but cluster access is misconfigured.
- Added support for silent script termination without full exception stack traces using
Write-Host
andreturn
.
Changed¶
- Replaced all direct
throw
calls in nested modules with friendly error messages and early exit patterns to improve UX. - Reorganized cluster validation into a single pre-check block within
Get-KubeData
for clarity and maintainability.
Fixed¶
- Fixed inconsistent behavior where failed parallel resource fetches did not always halt script execution as expected.
- Corrected exit behavior from AKS metadata fetch section to avoid crashing on partial failure.
- Fixed
Check-IngressHealth
function to reliably detect and report ingress issues: - Corrected ingress fetching logic to work consistently with or without pre-fetched
KubeData
. - Added checks for missing ingress class, TLS secret validation, duplicate host/path detection, and invalid path types, beyond just service existence.
[0.0.13] - 2025-04-08¶
Added¶
- 11 new checks added to the JSON and HTML reports:
- Resource configuration:
Check-ResourceQuotas
Check-NamespaceLimitRanges
Check-MissingResourceLimits
Check-HPAStatus
Check-PodDisruptionBudgets
Check-MissingHealthProbes
- Workload health:
Check-DeploymentIssues
Check-StatefulSetIssues
- Networking
Check-IngressHealth
- RBAC and identity:
Check-OrphanedRoles
Check-OrphanedServiceAccounts
- HTML report now includes collapsible recommendations for all checks
- Ingress health check detects references to missing backend services
- New logic in the HTML report to add pagination when needed
Changed¶
Check-OrphanedRoles
filtering updated to properly exclude namespaces during binding resolution- JSON report mode now uses
$KubeData
cache to speed up execution by avoiding duplicatekubectl
calls - HTML report section order and navigation updated to include new categories and findings
Fixed¶
- Fixed logic for HTML checks showing no findings — now prints the ✅ message consistently
- Corrected orphaned role detection to handle exclusion before usage analysis
[0.0.12] - 2025-03-30¶
Added¶
- Major performance improvement: report generation is now significantly faster due to parallelised kubectl resource fetching in
Get-KubeData
. This applies to HTML, text, and new JSON reports only, not interactive checks. - Added support for
-Json
output across key functions and checks, enabling structured machine-readable exports. - New
-Yes
parameter added to bypass interactive prompts in non-interactive or CI contexts. - Improved HTML report with optional hiding of ✅ sections when no issues are found.
Fixed¶
- Fixed incorrect exclusion of stuck jobs due to filtering logic.
Changed¶
- Error output during resource fetch and report generation is now cleaner and more informative.
[0.0.11] - 2025-03-28¶
Fixed¶
- Table output now displays correctly when pagination is enabled.
[0.0.10] - 2025-03-26¶
Added¶
- Added
Check-PodsRunningAsRoot
to identify pods that run with UID 0 or norunAsUser
set. - Added
Check-PrivilegedContainers
to detect containers running withprivileged: true
. - Added
Check-HostPidAndNetwork
to find pods usinghostPID
orhostNetwork
. - Added
Check-RBACOverexposure
to flag direct or indirect access tocluster-admin
privileges, including wildcard permissions via custom roles. - Added
-ExcludeNamespaces
switch to most checks and report generators. - Automatically uses custom list from
kubebuddy-config.yaml
if present. - Falls back to default list of common system namespaces.
- Integrated all the above checks into:
- The RBAC & Security interactive menu
- The HTML report with collapsible sections
- The floating sidebar navigation (TOC)
- Added contextual tooltips to HTML report headers for better inline explanation of metrics and checks.
Fixed¶
- Quitting from sub menus does not kill the PowerShell session now.
Changed¶
- Updated
Show-RBACMenu
to include the new security checks as menu options. - Updated HTML report to include additional security findings in the Security section.
[0.0.9] - 2025-03-20¶
Added¶
- Added support for specifying custom report filenames with
-OutputPath
, allowing users to save reports with specific names instead of the default timestamped filename. - Reports now automatically include timestamps (
YYYYMMDD-HHMMSS
) when saved in a directory, preventing accidental overwrites. - The documentation has been updated to reflect these changes.
Fixed¶
- Improved cross-platform path handling for PowerShell scripts, ensuring compatibility with both Windows and Linux file structures.
- Ensured that directories are created correctly when specifying an output path.
[0.0.8] - 2025-03-20¶
Fixed¶
- Fixed an issue with where we were importing modules twice.
[0.0.7] - 2025-03-20¶
Fixed¶
- Fixed an issue with folder case to allow linux to import the correct modules.
[0.0.6] - 2025-03-19¶
Fixed¶
- Fixed issue where
$moduleVersion
was not being correctly updated in thekubebuddy.ps1
script when setting the version dynamically. - Corrected the PowerShell script logic to handle version updates reliably using
$tagVersion
. - Resolved an error where the replace operation in the script failed due to incorrect concatenation of the
$tagVersion
variable.
[0.0.5] - 2025-03-19¶
Added¶
- AKS best practices check with -aks, -SubscriptionId, -ResourceGroup, and -ClusterName, performing 34 different configuration and security checks tailored for Azure Kubernetes Service.
[0.0.4] - 2025-03-12¶
Added¶
- Added new logo to html report.
[0.0.3] - 2025-03-06¶
Added¶
- Initial release of KubeBuddy, providing snapshot-based monitoring, resource usage insights, and health checks for Kubernetes clusters.