WinterFlow.io - Netdata v2.8.0 release notes

Summary
Highlights
Acknowledgments
Contributions
Deprecation notice
Support options

Release Summary

Netdata v2.8.0 introduces powerful monitoring capabilities and enhanced system visibility alongside continued improvements to stability and reliability.

On the security front, we are proud to announce that Netdata has achieved SOC 2 Type 2 Compliance.

Highlight	Summary
Expanded Chart Analysis	A new UI experience to compare timeframes, correlate metrics, and drill down into dimension values directly from any chart.
Netdata AI Improvements	Various enhancements to Netdata AI including the ability to visualize logs within Insights/Investigation views.
Expanded Scheduled Reports	Schedule and export full Insights and Investigations views directly to your email, moving beyond simple Dashboards.
ServiceNow Integration	Enterprise-grade incident management with direct alerts to ServiceNow.
Generic SQL Collector (Alpha)	Monitor any SQL database with custom queries. Define your own metrics for MySQL, PostgreSQL, Oracle, and MS SQL Server.
PSS-Based Memory Estimation	More accurate memory tracking for processes using shared memory through intelligent PSS sampling with adaptive prioritization.
Stability & Reliability Improvements	Multiple fixes and enhancements to improve system robustness and eliminate potential issues in production environments.

Release Highlights

News: SOC 2 Type 2 Compliance

Security and trust are paramount at Netdata. We are excited to announce that we have successfully achieved SOC 2 Type 2 compliance.

This certification validates that our internal controls, policies, and procedures regarding security, availability, and confidentiality meet the rigorous standards set by the AICPA. It reflects our ongoing commitment to keeping your data secure.

Expanded Chart Analysis

We have reimagined how you interact with charts to troubleshoot faster with the new Expanded Chart Analysis feature. Below each chart, you will now see an “Expand - Chart Analysis” option.

Clicking this opens a dedicated analysis view offering four powerful tools to dissect your data:

Compare: Instantly compare the current data against different time periods or baselines (e.g., 24 hours prior, 7 days prior, or custom timeframes) to spot anomalies immediately.
Correlate: Leverage Netdata AI to automatically find other metrics in your system that correlate with the behavior of the chart you are viewing.
Drill Down: Explore related metrics and child contexts using Weights Analysis to understand which specific dimensions are driving the spike.
Chart Values: View raw dimension values and statistical distributions to understand the exact data composition.

Netdata AI Improvements

This release brings various fixes and improvements to Netdata AI, including the ability to visualize logs directly within Insights reports and Investigations.

You can now correlate metrics and logs seamlessly. When viewing a spike in a metric, Netdata AI can analyze and display relevant log lines from that exact timeframe, helping you pinpoint the root cause faster by connecting symptoms (metrics) with evidence (logs).

Netdata AI Scheduled Reports

We have expanded the scope of what you can share with your team. Scheduled Reports are no longer limited to standard Dashboards.

You can now schedule and export full Insights and Investigations views. This allows you to deliver comprehensive, context-rich root cause analyses directly to stakeholders’ inboxes, ensuring everyone has the full picture without needing to log in and navigate the UI.

ServiceNow Integration

For enterprise environments, we have added a native ServiceNow integration. You can now route Netdata alerts directly to ServiceNow ITSM, automating incident creation and streamlining your response workflows without custom webhooks or middleware.

Generic SQL Database Monitoring

We’re introducing a flexible, configuration-driven SQL collector that lets you monitor any SQL database with custom queries and metrics. This release supports MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.

[!IMPORTANT] This is an alpha feature. Expect rapid improvements and potential configuration format changes in upcoming releases.

What You Get

Feature	Details
Fully Customizable Metrics	Define your own SQL queries and transform results into Netdata charts
Multi-Database Support	Works with MySQL/MariaDB, PostgreSQL, Oracle Database, and MS SQL Server using native drivers
Flexible Data Mapping	Two result processing modes: Columns mode (map specific columns to dimensions) and KV mode (handle dynamic key-value style results)
Per-Row Labels	Create separate chart instances for each row (e.g., per-database, per-table metrics)
Status Metrics	Convert state values into binary 0/1 dimensions with conditional rules

Perfect For

Custom business KPIs and application-specific metrics
Database engines not yet covered by dedicated Netdata collectors
Monitoring internal tables, views, or stored procedure results
Prototyping metrics before requesting them as built-in collectors

PSS-Based Memory Estimation in apps.plugin

The apps.plugin now provides more accurate memory usage tracking for processes that heavily use shared memory (databases, cache servers, container runtimes) through intelligent PSS (Proportional Set Size) sampling.

Requirements: Linux kernel 4.14 or later. Enabled by default with 5-minute sampling interval—no configuration needed for typical deployments.

Traditional RSS (Resident Set Size) measurements overstate memory usage for processes sharing memory pages. For example, if 10 processes share a 100MB library, RSS would report 1GB total usage when actual consumption is far less.

What’s New

Feature	Details
PSS-Based Estimation	Uses Proportional Set Size to divide shared memory fairly among processes, showing true memory footprint in the new `app.estimated_mem_usage` chart
Adaptive Sampling Strategy	Intelligently prioritizes large memory consumers and processes with significant changes, refreshing them within seconds while ensuring all processes are eventually sampled
Performance Optimized	Ratio-based estimation applies cached PSS/RSS ratios to real-time RSS readings, avoiding expensive kernel operations every second
Dual Chart Display	Shows both estimated memory (PSS-scaled) and traditional RSS charts for comparison under System → Processes → Apps → Memory
Configurable Intervals	Control sampling frequency with `--pss 5m` (default) or disable entirely with `--pss off` for systems without shared memory workloads

How It Works

Periodic Sampling: Reads /proc/<pid>/smaps_rollup to calculate PSS/RSS ratios for each process
Adaptive Prioritization: Alternates between two strategies each iteration:
- Delta-based: Targets processes with largest shared memory changes (catches memory growth within seconds)
- Age-based: Ensures all processes refreshed within 2× the configured interval (default: 10 minutes)
Real-time Estimation: Applies cached ratios to current RSS values every second for accurate, low-overhead tracking

[!TIP] Running databases, Redis, or containerized workloads? PSS estimation will give you a much clearer picture of actual memory consumption versus traditional RSS measurements.

Acknowledgments

@arkamar for adding InnoDB Redo Log monitoring charts to MySQL collector.
@hack3ric for fixing libbpf.a build path.
@clan for fixing build failure on musl libc.

Contributions

Collectors

Improvements

Added generic SQL collector supporting MySQL, PostgreSQL, Oracle, and MS SQL databases with custom query execution and dynamic chart generation (go.d/sql) (#21281, #21313, @ilyam8)
Changed service discovery configuration format to a simplified single-step services block that combines the legacy classify and compose mechanisms (go.d) (#21269, #21273, @ilyam8)
Added sensors monitoring using Windows Sensor API (windows.plugin) (#21266, #20988, @thiagoftsm)
Added MSSQL Replication Publisher monitoring (windows.plugin) (#21235, @thiagoftsm)
Added PSS-based memory estimation to apps.plugin for accurate memory usage tracking of processes using shared memory with ratio-based sampling (apps.plugin) (#21199, @ktsaou)
Added MSSQL Jobs chart to windows.plugin showing enabled and disabled job counts per SQL Server instance (windows.plugin) (#21182, @thiagoftsm)
Added InnoDB Redo Log monitoring charts to MySQL collector including activity, occupancy, and checkpoint age metrics (go.d/mysql) (#21153, @arkamar)
Added IBM Ecosystem Monitoring Support for Netdata (ibm.d) (#21066, #21123, #21124, #21128, #21132, #21141, #21147, #21158, #21164, #21204, @ktsaou, @ilyam8)
Added optional ICMP ping metrics to SNMP collector with configurable ping RTT and jitter charts (go.d/snmp) (#21052, #21054, #21064, @ilyam8)
Added CPU temperature monitoring (windows.plugin) (#20992, @thiagoftsm)

Other

Added autodetection_retry option to nvidia_smi collector (go.d/nvidia_smi) (#21311, @ilyam8)
Reorganized MSSQL collection by isolating perflib code from queries to improve code maintainability (#21290, #21255, #21256, @thiagoftsm)
Refactored dyncfg implementation by extracting common functionality into a dedicated package and introduced prefix-based routing for function registration (#21263, #21245, @ilyam8)
Fixed ping collector to preserve original error messages by using proper error wrapping (go.d/ping) (#21251, @ilyam8)
Refactored SNMP profile static_tags to use structured key/value format (go.d/snmp) (#21180, @ilyam8)
Improved SNMP collector to automatically fall back to simple walk when bulkwalk is not supported by the device (go.d/snmp) (#21139, @ilyam8)
Converted Go collectors to use ndexec module for external command invocation (go.d) (#21067, @Ferroin)
Cleaned up SNMP profile definition package (go.d) (#21062, @ilyam8)
Removed legacy custom OID collection from SNMP collector (go.d/snmp) (#21056, @ilyam8)

Packaging/Installation

All changes

Added openSUSE Tumbleweed to CI and package builds (#21276, #21296, @Ferroin)
Extended code signing in Windows CI to cover drivers (#21242, @Ferroin)
Added handling in kickstart script to remove existing repository configuration packages and clear cached metadata to prevent installation failures (#21226, @Ferroin)
Fixed handling of auto updater and data files during native package removal to ensure cleanup of unmanaged files (#21203, #21246, @Ferroin)
Made OpenTelemetry plugin a required dependency for native DEB and RPM packages (#21194, @Ferroin)
Updated user and group account handling to use systemd-sysusers when available and unified account creation code across static and local builds (#21162, @Ferroin)
Enabled Rust-based systemd journal handling code in Docker builds as an alternative to using libsystemd (#21161, @Ferroin)
Added failure reporting when native package updates are unsuccessful by comparing installed versions against published releases (#21144, @Ferroin)
Added Fedora 43 to CI and package builds (#21142, @Ferroin)
Changed IBM plugin library lookup to use a relative RUNPATH based on $ORIGIN instead of hardcoded absolute paths for better portability and reproducibility (#21131, @Ferroin)
Made native package dependencies consistent between DEB and RPM packages by fixing plugin recommendations and removing obsolete references (#21118, @Ferroin)
Fixed IBM libs by using correct component name and fixing file lists for DEB and RPM packages (#21117, #21120, @Ferroin)
Properly check for ODBC for IBM plugin at configuration time. (#21116, @Ferroin)
Added openSUSE Leap 16.0 and Ubuntu 25.10 to CI and package builds (#21100, @Ferroin)
Fixed libbpf.a build path by explicitly setting the installation directory to /usr/lib for consistency across 64-bit architectures (#21051, @hack3ric)
Consolidated compiler flag handling into CMake code for better control and maintainability, and added STATIC_BUILD option for managing static build flags (#20821, @Ferroin)

Documentation

All changes

Added ScyllaDB Prometheus integration documentation (#21308, @ilyam8)
Removed prometheus SQL Exporter doc (#21306, @ilyam8)
Updated account deletion guide to include the new email confirmation step that prevents accidental account deletions (#21293, @kanelatechnical)
Updated installer README to include manual installation paths with platform-specific links (#21292, @kanelatechnical)
Added link to Trust Center in Security and Privacy documentation (#21271, @kanelatechnical)
Improved COLLECTORS.md generation (#21225, #21228, #21233, @ilyam8)
Updated Logs Centralization Points with systemd-journald docs (#21220, #21221, @kanelatechnical)
Removed outdated note from Monitor Anything doc (#21217, @kanelatechnical)
Removed separator lines from various docs (#21215, #21216, #21219, #21222, @kanelatechnical)
Added SNMP profile documentation (#21201, #21210, #21223, #21277, @ilyam8)
Updated Netdata API doc (#21193, @kanelatechnical)
Added host labels configuration section to Prometheus exporter documentation (#21187, @kanelatechnical)
Updated Notifications description in cloud-notifications metadata. (#21159, @ilyam8)
Updated documentation to reflect achieved SOC 2 Type 2 certification status (#21157, #21265, @ktsaou)
Added ServiceNow Cloud notification integration documentation (#21154, @car12o)
Added Customizing Your Node Name section to Daemon Configuration Reference doc (#21151, @kanelatechnical)
Updated IBM plugin documentation (#21122, @Ferroin)
Added Windows install types and release channels documentation (#21119, @kanelatechnical)
Removed Challenge secret section from Webhook documentation in cloud-notifications (#21105, @car12o)
Added warning about silent installation not being supported on Windows Server versions earlier than 2019 due to TLS compatibility issues (#21096, @thiagoftsm)
Documented all Netdata agent REST APIs with v1 and v2 marked as deprecated and v3 including request and response details (#21086, #21176, @ktsaou)
Updated SNMP collector metadata to reflect profile-based collection (#21078, @ilyam8)
Added note about using --init flag when not running Docker containers with pid: host configuration (#21075, @ilyam8)
Updated Child Node Behavior section to Node Rule-Based Room Assignment doc (#21073, @kanelatechnical)
Reorganized Netdata AI documentation (#21043, @shyamvalsan)

Other Notable Changes

Improvements

Improved dbengine tier 0 retention handling to prevent exceeding configured time or size limits (#21280, #21282, @stelfrag)
Implemented fixed time-based training windows for ML models to ensure consistent behavior across all metrics regardless of collection frequency (#21046, @ktsaou)
Added helper program (nd-run) to run external commands without additional privileges by switching to the netdata user and clearing environment variables and capabilities (#20990, #21076, #21097, @Ferroin, @ilyam8)

Other

Fixed csvjsonarray format returning invalid JSON with extra closing bracket when no data was present (#21304, @Copilot)
Added defensive checks and cleanup during RRD context initial load to prevent crashes and memory leaks from null acquisitions (#21298, @stelfrag)
Added NULL check when executing CLI commands to prevent crashes during agent initialization (#21286, @stelfrag)
Initialized reusable buffers when streaming ML data to parent agent (#21279, @stelfrag)
Added detection for netdata CLI initialization failure to prevent crashes when attempting to shutdown uninitialized CLI threads (#21275, @stelfrag)
Fixed ephemerality label to always update with the current ephemeral status of the host (#21274, @stelfrag)
Improved status file timestamp computation by storing pre-formatted timestamps to avoid signal async unsafe calls (#21272, @stelfrag)
Added missing modulo operator to alerts evaluation (#21267, @stelfrag)
Improved websocket thread shutdown with timeout protection, non-blocking I/O configuration, and enhanced error handling to prevent indefinite hangs (#21264, @stelfrag)
Improved agent startup performance by optimizing journal file processing and simplifying thread management (#21260, @stelfrag)
Fixed race condition in memory allocator that could cause crashes or memory corruption during concurrent access (#21258, @stelfrag)
Fixed systemd-cat-native crash on realloc (#21254, @ktsaou)
Improved ML shutdown by adding checks for dimension training in progress and validating host state before release (#21250, @stelfrag)
Routed dyncfg GET requests through plugin to allow UI to retrieve migrated configurations directly from plugin responses (#21249, @ktsaou)
Fixed tier check (#21248, @stelfrag)
Increased WebSocket inactivity timeout from 5 to 30 minutes to accommodate long-running queries and operations (#21244, @ktsaou)
Adjusted page cache locking to ensure memory deallocation happens after releasing queue locks (#21240, @stelfrag)
Added validation check for metric count before journal file creation to prevent invalid operations (#21238, @stelfrag)
Optimized weights calculation by processing multiple nodes in parallel (#21184, @stelfrag)
Fixed build failure on musl libc by defining USE_NOTRACE when backtrace() is not available (#21165, @clan)
Fixed build failure when ENABLE_LIBUNWIND is enabled (#21163, @clan)
Optimized datafile handling by using block numbers instead of offsets, reducing memory usage and improving performance (#21098, @stelfrag)
Refined event loop processing to handle callbacks and queued commands more efficiently (#21091, @stelfrag)
Fixed go.d.plugin to include .exe extension on Windows (#21070, @thiagoftsm)

Deprecation notice

Changed in this release

SNMP Legacy Collection Removed

Custom OID configuration has been removed. Netdata now only supports profile-based SNMP collection.

Important Changes in Next Major Release

Deprecated Components

Component Type	Versions Being Deprecated
APIs	v1, v2

What This Means

Only the v3 API and v3 Dashboard will be supported starting with the next major release. These newer versions offer improved performance, enhanced features, and better security.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:

Premium Support: Customers who wish to have a direct channel with Netdata and prioritized support with defined SLAs can contact us.
Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
GitHub Issues: Use the Netdata repository to report bugs or open a new feature request.
GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!

Netdata

v2.8.0

Table of Contents