Netdata
Real-time infrastructure and application monitoring platform
Alternative to: Prometheus, Grafana, Nagios, Zabbix, Datadog, New Relic, Sensu, Dynatrace
v2.1.0
2024-12-19Table of Contents
Netdata Growth
- 1.5 million downloads per day
- 72.6k GitHub stars!
- 651M Docker Hub pulls!
Netdata continues to experience phenomenal growth, with over 1.5 million downloads daily through Cloudflare and Docker Hub, fueling observability for users worldwide.
Thanks to your unwavering support ❤️, Netdata is the leader in the observability category in the CNCF landscape, ahead of all other solutions, including Elasticsearch, Grafana, and Prometheus, in GitHub stars. This demonstrates the trust and admiration of our community.
This success drives rapid adoption among enterprises, reflecting the growing recognition of Netdata as the go-to observability solution for both cloud-native and on-premises environments. Our commitment remains steadfast: to deliver cutting-edge, AI-powered observability with unmatched performance and simplicity—all while being significantly more affordable.
As we evolve, our focus on empowering businesses with higher-fidelity AI insights ensures Netdata remains the easiest and fastest way to optimize infrastructure and applications at any scale. 🚀
You like Netdata? Give Netdata a ⭐ too, on GitHub!
Release Summary
This release focuses heavily on streaming functionality, enabling unprecedented scalability, reduced CPU overhead, and optimized memory utilization. Netdata has been re-architected to meet the demands of enterprise environments while maintaining its hallmark ease of use and affordability.
Release Summary
This release focuses heavily on streaming functionality, enabling unprecedented scalability, reduced CPU overhead, and optimized memory utilization. Netdata has been re-architected to meet the demands of enterprise environments while maintaining its hallmark ease of use and affordability.
Release Highlights
Major Performance and Scalability Improvements
This release significantly enhances Netdata’s performance and streaming capabilities, with particular focus on multi-parent infrastructures:
- Optimized CPU Usage: Streamlined ML model distribution and improved thread management reduce CPU utilization by 30–50% in parent-child setups.
- Smarter Memory Management: New features prevent out-of-memory situations while maximizing cache usage for better query performance.
- Enhanced Multi-Parent Scalability: Improved load balancing and connection handling for more stable operation at scale.
- Optimized Query Processing: Prioritized handling of user queries ensures responsive experience even under heavy load.
Detailed Technical Improvements:
| Category | Feature | Benefit |
|---|---|---|
| CPU Optimization | ML Model Streaming | • ML models now stream between Netdata Agents alongside metric data • Options for edge or central ML training • 30-50% CPU reduction in parent-child setups • Note: Next major version will disable ML training on children by default |
| Thread Management | • Streaming threads fixed to match CPU cores • Single thread handles ingestion and re-streaming per node • Reduced context switches and cross-CPU communication | |
| Memory Management | Out-of-Memory Protection | • Dynamic cache adjustment maintains 10% system memory buffer (max 5 GiB) • Container-aware (supports cgroups v1 and v2) • Configurable via [db].dbengine out of memory protection |
| Cache Optimization | • Option to utilize all available memory for caching • Reduced disk I/O on busy parent nodes • Enable with [db].dbengine use all ram for caches | |
| ML Training Management | • Dynamic queue management prevents memory overload • Consistent performance during heavy ML workloads | |
| Scalability | Parent Cluster Load Distribution | • Random parent selection for load distribution • Prevents single-node bottlenecks in large deployments |
| Connection Handling | • Randomized reconnection timing • Prevents connection floods • Smoother large-scale reconnect handling | |
| Query Performance | Query Prioritization | • Immediate response to user queries under any load • Connection operations get secondary priority • Background tasks (replication, ML) yield to high-priority operations • Quick new node integration through expedited backfilling |
| Real-Time Response | • Responsive user experience during heavy processing • Efficient concurrent query handling • Maintains performance during high-load background operations |
Cloud: Automated Room Assignment with Label-Based Rules
Netdata Cloud Dashboard introduces node rule-based room assignment—a powerful new feature that transforms how you organize your infrastructure monitoring:
- Dynamic Room Assignment: Nodes are automatically placed into relevant rooms based on their host labels, eliminating manual organization.
- Rule-Based Management: Create flexible rules using host labels to define where nodes belong, ensuring consistent organization.
- Scale-Ready Architecture: As your infrastructure grows, new nodes are automatically sorted into appropriate rooms, maintaining clean monitoring structure.
Cloud: Configurable Alert Repeat Notifications
Netdata Cloud enhances alert management with customizable notification repeats:
- Custom Repeat Intervals: Set how often you want to be reminded about ongoing alerts for each notification channel.
- Automated Follow-ups: Receive automatic notification repeats for unresolved alerts based on your specified timeframe.
- Channel-Specific Settings: Configure different repeat frequencies for each integration to match your workflow.
Cloud: Pin Your Essential Charts with Dashboard Favorites
Netdata Cloud Dashboard introduces favorites pinning for faster access to your critical monitoring views:
- One-Click Pinning: Select and pin your most important charts and sections directly from the dashboard.
- Quick-Access Organization: Pinned items appear at the top of your Table of Contents for instant visibility.
Dynamic Configuration: Bulk Operations for Collectors and Alerts
Dynamic Configuration in Netdata now supports bulk operations on monitoring settings. You can perform the following operations on multiple collector jobs and health checks at once:
- Enable/Disable
- Restart
- Delete
Acknowledgments
- @orisano for removing a duplicated row in logging readme.
Contributions
Collectors
Improvements
- Add dyncfg support for Virtual Nodes (go.d.plugin) (#19205, #19207, #19214, #19238 @ilyam8)
- Add monitoring of /run/reboot-required (proc.plugin) (#19109, @ilyam8)
- Add “force_http2” option to collectors that use HTTP for metrics collection (go.d.plugin) (#19047, @ilyam8)
- Add support for checking full chain expiry time (go.d/x509check) (#19001,#19004 @ilyam8)
- Add data collection status chart and alert (go.d.plugin) (#18981, #18989, #18990 @ilyam8)
- Add cluster support for RabbitMQ collector (go.d/rabbitmq) (#18965, #18972, #18976 @ilyam8)
Bug fixes
- Prevent connection leak when Ping fails (go.d/mongodb) (#19232, @ilyam8)
- Properly release file locks during service reload (go.d.plugin) (#19153, #19154 @ilyam8)
- Handle “HPE Smart Array” line in HPSSA collector (go.d/hpssa) (#19084, @ilyam8)
- Handle missing sysName gracefully in SNMP collector (go.d/snmp) (#18970, @ilyam8)
Other
- Add MegaCli64 to ndsudo (#19223, @ilyam8)
- Code refactor for simplicity (go.d.plugin) (#19143, #19145, #19146, #19155 @ilyam8)
- Minor Hyper-V fixes (windows/hyperv) (#19130, @ilyam8)
- Reduce EBPF memory usage (#19117, @stelfrag)
- Disable python example collector (python.d/example) (#19114, @ilyam8)
- Disable monitoring of /run/reboot-required on non-Debian systems (proc/reboot_required) (#19110, @ilyam8)
- Improve error handling in callback functions in socket package (go.d.plugin) (#19103, @ilyam8)
- Correct close idle connections in web package (go.d.plugin) (#19052, @ilyam8)
- Implement terminating on QUIT command (go.d.plugin) (#19038, @ilyam8)
- Preserve original process names in metrics labels (windows/netframework) (#19036, @ilyam8)
- Auto-adjust GOMAXPROCS based on container CPU limits (go.d.plugin) (#19023, #19026 @ilyam8)
- Code cleanup and renames (go.d.plugin) (#18987, #19081, #19087, #19090, #19180 @ilyam8)
Packaging/Installation
All changes
- Add PCRE2 development library to required packages (#19217, @ilyam8)
- Disable compilation of H2O (#19216, #19218, @ilyam8)
- Use
setuidas a fallback for static builds whensetcapfails for plugins (#19215, @ilyam8) - Fix native package availability check on Debian-based systems in kickstart (#19183, @ilyam8)
- Update deb repository config fetched by kickstart to the latest version (#19181, @ilyam8)
- Update incorrect checksum for Golang (32-bit Linux) (#19127, @ilyam8)
- Add —dev option to installer (#19034, @ktsaou)
- Improve Windows installer (#18983, #19122, #19132, #19159 @thiagoftsm)
Documentation
All changes
- Fix deployment command for Windows Agent nightly version (#19236, @ilyam8)
- Update network requirements to use domain-based allowlisting for Cloud connectivity (#19222, @M4itee)
- Add a user guide for dynamic room configuration (#19199, @kapantzak)
- Remove a duplicated row in logging readme (#19190, @orisano)
- Reorder silent mode and add full pipeline command examples (#19176, @Ancairon)
- Fixup URLs in package repo documentation to use index files (#19174, @Ferroin)
- docs: leftover links + changes on api-tokens.md (#19162, @Ancairon)
- Improve Cloud Authentication and Authorization docs (#19160, @Ancairon)
- Improve Cloud Plans and ACLK docs (#19140, @Ancairon)
- Improve Cloud readme (#19139, @Ancairon)
- Reorganize Netdata repo readme introduction for clearer project overview (#19134, @ilyam8)
- Update window plugin metadata (#19129, #19147, #19158, #19171, #19175, #19188, #19182 @thiagoftsm @ilyam8)
- Fix formatting, typos, and some simplifications in the
docs/directory (#19112, @ilyam8) - Improve Cloud On Prem docs (#19104,#19105, [#19107](https://github.com/netdata/netdata/pull/19107 @Ancairon)
- Improve Organize Your Infrastructure documentation (#19101, @Ancairon)
- Improve readability of Claiming documentation (#19100, @Ancairon)
- Improve Registry docs (#19095, @Ancairon)
- Fix full-text search instructions and typos in systemd-journal plugin readme (#19093, #19066 @ilyam8)
- Improve Daemon docs(#19091, @Ancairon)
- Remove stale docs, and update links and optimization documentation (#19089, @Ancairon)
- Remove Go windows integration (#19078, @Ancairon)
- Split database overview and configuration reference (#19077, @Ancairon)
- Improve database docs (#19075, @Ancairon)
- Update sizing Netdata Agent pages (#19074, @Ancairon)
- Simplify collector configuration page (#19072, @Ancairon)
- Create a terminology dictionary for Netdata (#19071, @Ancairon)
- Update terminology from “claim” to “connect” for Node connection process (#19060, @Ancairon)
- Update Windows installation docs (#19054, @Ancairon)
- Cleanup Securing Agents section docs (#19053, @Ancairon)
- Update documentation about our native package repos (#19049, @Ferroin)
- Capitalize the word “Agent” and “Cloud” (#19043, #19044, @Ancairon)
- Remove references to old MSI installer from go.d/windows metadata (#19024, @ilyam8)
- Add deprecation notice go.d/windows collector (#19009, @ilyam8)
- Update Windows installation and deployment documentation (#18765, #18928, @thiagoftsm)
Other Notable Changes
Improvements
Bug Fixes
Other
- Improve shutdown handling by preventing data file rotation and deferring alert state changes to startup (#19241, @stelfrag)
- Rename some internal charts context for better organization (#19239, @ilyam8)
- RRDHOST system-info isolation (#19235, @ktsaou)
- Allow more threads to load contexts during startup (#19234, @stelfrag)
- Release health summary memory when host health monitoring is disabled (#19233, @stelfrag)
- Fix heap use after free in health (#19228, @ktsaou)
- Fix compiler warnings on 32-bit (#19221, @ktsaou)
- Remove July arrays (#19194, @stelfrag)
- Allow recursive readers, even when writers are waiting (#19191, @ktsaou)
- Send QUIT to plugins (#19166, @ktsaou)
- Add units per context to /api/v3/contexts (#19165, @ktsaou)
- Fixed bug in streaming sender read (#19136, @ktsaou)
- Minor beatification of log messages (#19135, @ktsaou)
- Update macOS identification to use consistent naming in system-info.sh (#19128, @ilyam8)
- Avoid scanning charts for replication status (#19124, @stelfrag)
- Move eBPF code from linetdata to src/collector (#19121, @thiagoftsm)
- Change default nice level to 0 (#19120, @ilyam8)
- Fix undefined behaviour in
ebpf_select_pc_prefix()(#19116, @vkalintiris) - Use system environment proxy settings by default for Cloud connection and add connection logging (#19098, @ktsaou)
- Reset parameter when generating an alert snapshot (#19097, @stelfrag)
- Correct reporting of metrics count, instance count, and context statistics (#19094, @ktsaou)
- Add optional mimalloc allocator support at compile time (#19080, #19118, @stelfrag)
- Update gorilla compression internal charts family (#19068, @ilyam8)
- Do not intentionally abort on non-0 exit code (#18991, @vkalintiris)
- Add /api/v3/stream_path (#18943, @ktsaou)
Deprecation notice
Important Changes in Next Major Release
This release will be the last version supporting the following legacy components:
Deprecated Components
| Component Type | Versions Being Deprecated |
|---|---|
| APIs | v1, v2 |
| Dashboards | v0, v1 |
What This Means
Starting with the next major release, only the v3 API and v3 Dashboard will be supported. These newer versions offer improved performance, enhanced features, and better security.
Important Changes in Next Release
1. Removal of go.d Windows Collector
The go.d Windows collector will be removed in the next release. Users should migrate to the native Windows Netdata Agent.
2. Kubernetes Service Discovery Changes
Removed Components
The Agent Service Discovery sidecar container will be removed from the Netdata Helm chart as this functionality is now natively integrated into the go.d.plugin.
Impact on Custom Configurations
If you have custom Kubernetes service discovery configurations, you will need to update your settings in the following sections:
| Old Section | New Section | Description |
|---|---|---|
discovery | discover | Section for configuring the Kubernetes service discoverer |
build | compose | Section for creating data collection job configurations |
Example Migration
-
Previous Configuration Format
discovery: k8s: - tags: unknown role: pod local_mode: true build: - name: "Applications" selector: '!unknown applications' tags: file apply: - selector: apache template: | - module: apache name: apache-{{.TUID}} url: http://{{.Address}}/server-status?auto -
New Configuration Format
# Root sections renamed to "discover" and "compose" discover: - discoverer: k8s k8s: - tags: unknown role: pod pod: local_mode: yes compose: # Renamed from "build" - name: "Applications" selector: "app" config: # Renamed from "apply" - selector: "apache" template: | - module: apache name: apache-{{.TUID}} url: http://{{.Address}}/server-status?auto
Required Actions
- Migrate to the new syntax before upgrading
- Refer to the current Netdata Helm chart service discovery configuration for the updated syntax.
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Premium Support: Customers who wish to have a direct channel with Netdata and prioritized support with defined SLAs can contact us.
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!