Netdata
Real-time infrastructure and application monitoring platform
Alternative to: Prometheus, Grafana, Nagios, Zabbix, Datadog, New Relic, Sensu, Dynatrace
v2.0.0
2024-11-07Table of Contents
Netdata Growth
- 1.5 million downloads per day
- 72k GitHub stars!
- 648.3M Docker Hub pulls!
Netdata is being downloaded more than 1.5 million times per day, as reported by Cloudflare, which distributes our binary packages, and Docker Hub, which distributes our docker containers!
Thanks to your love ❤️, Netdata is leading the observability category in the CNCF landscape, having significantly more stars than Elasticsearch, Grafana, Prometheus, and all other observability solutions.
We are committed to providing the most advanced and innovative AI-powered, cloud-native, and on-premises observability solution, to help us have higher-fidelity AI insights while being easier, faster, simpler, and significantly more affordable!
Do you like Netdata? Give Netdata a ⭐ too, on GitHub!
Release Summary
Netdata 2.0 has arrived!
This release marks a major milestone in our roadmap, expanding Netdata’s reach and refining its core components.
We’re thrilled to announce native Windows support! Netdata now runs seamlessly on Windows, in addition to Linux, macOS, and FreeBSD. This advancement required an extensive rework across the Netdata codebase. While we continue using MSYS2 for certain POSIX dependencies, we’re making strides toward full abstraction of this layer. The Windows codebase is fully open-source, offering users complete functionality on Windows with no additional dependencies. Note that access to Windows systems via the Netdata UI requires a Paid Netdata Cloud subscription.
In response to feedback from the community, Netdata UI v2 has been removed from the open-source repository, addressing distribution challenges for Linux platforms. The new Netdata UI v3 is now a standalone package with a separate license, installable independently.
The introduction of Netdata API v3 consolidates all API calls into a single, robust API. This step clears the path for retiring the old v0, v1, and v2 APIs in future releases. With the upcoming release, dashboards built on these versions will no longer be supported, making way for streamlined, future-proofed Netdata integrations.
Release Highlights
Native Windows Support
Netdata now delivers comprehensive monitoring for Windows systems, including metrics, logs, process monitoring, machine learning, alerts, streaming, and more. Key features include:
- Installation via Windows Installer
- New
windows.plugin: Collects extensive system and application metrics, covering:- CPU, memory, and network performance
- Physical and logical disk monitoring
- Network stack insights and interface metrics
- IIS, MSSQL, .NET, and Hyper-V monitoring
- Enhanced
apps.plugin: Monitors Windows processes - New
windows-events.plugin: Offers in-depth visibility into Windows Events. Read more about the Windows Events Plugin here.
Process Monitoring: Simplified and Enhanced
apps.plugin has been significantly reworked to introduce dynamic process grouping:
- Automates grouping: automatically categorizes processes, significantly reducing manual configuration efforts.
- Eliminates “other” dimension: unmatched processes are now grouped dynamically.
Read more about it in this blog post.
Network Monitoring
Enhanced SNMP collector
The SNMP collector has been significantly improved, making it easier to configure and providing better visualizations for your SNMP devices. Further improvements related to network monitoring are coming soon!
Improved Performance
The redesigned network-viewer.plugin and local-listeners now deliver breakthrough performance in high-traffic environments, processing thousands of socket connections with minimal overhead. Experience real-time network insights without compromising system performance!
Netdata SSO
- Secured Access to Indirectly Claimed Agents: secure access to Children Agents, even when not directly connected to Netdata Cloud. SSO information is propagated from Parent Agents, ensuring consistent and controlled access.
- Protected API Access: the new
[web].bearer token protectionsetting innetdata.confenforces SSO protection for the entire API, restricting Agent dashboard access to authorized Netdata Cloud users.
Enterprise SSO - Enhanced SP Initiated SSO Flow
You can now initiate a login flow directly from the Netdata Sign-In page without initiating the SSO flow from the IDP.
Netdata now supports configuring a DNS TXT record on the IDP and will allow the user to sign in by providing the email address.
Read the Enterprise SSO documentation for more details.
Netdata Referral Program
If you appreciate Netdata and would like to help spread the word, the newly launched Netdata referral program helps you earn money while spreading the good word. Referring someone is easy to do with the help of a couple of clicks, directly from the UI - look out for the gift icon. Every referred user will get a 10% discount when they subscribe to Netdata Business or Homelab - and you will receive 10% of their subscription value (up to a max of 1000$ per space). You can refer an unlimited number of users, so there’s no real limit to how much you can earn with the referral program.
Alerts Silencing Recurrence
Netdata Cloud UI now supports scheduling recurring silence rules for Alerts at a Space, Room, Node, and Alert level.
Configurable Timeouts on Reachability Notifications
Netdata introduces configurable timeouts for reachability notifications at Space and Room levels.
Comprehensive Unicode Support
Netdata now provides complete UTF-8 compatibility, enabling support of international characters in all metadata—from chart names and dimensions to labels and logs. This enhancement ensures the proper display of non-Latin characters and symbols.
Configuration Updates: Intuitive Unit Specifications
Netdata introduces human-friendly unit notation in configuration files. Use natural expressions like 1d (one day), 1MiB (one megabyte), or 500ms (milliseconds) in netdata.conf and stream.conf.
Acknowledgments
- @DaTiMy for adding ilert Agent notification method.
- @daniel-sampliner for fixing the container name resolution issue for containers without environment variables.
- @eatnumber1 for adding support for controller ROC temperature to go.d/storcli.
- @eya46 for fixing the issue in kickstart.sh that created invalid claim.conf file.
- @teqwve for adding the exiting on SIGPIPE functionality to the slabinfo.plugin.
Contributions
Collectors
New/Rewritten
- Add Windows Events Logs Explorer (windows-events.plugin) (#18483, #18528, #18563, #18564, #18564 @ktsaou)
- Add collector for MaxScale (go.d/maxscale) (#18859, @ilyam8)
- Add collector for NGINX Unit (go.d/nginxunit) (#18554, @ilyam8)
- Add collector for Typesense (go.d/typesense) (#18538, @ilyam8)
- Rewrite collector for SpigotMC (go.d/spigotmc) (#18890, @ilyam8)
- Rewrite collector for OracleDB (go.d/oracledb) (#18654, @ilyam8)
- Rewrite collector for OpenLDAP collector (go.d/openldap) (#18625, @Ancairon)
- Rewrite collector for Ceph (go.d/ceph) (#18582, @Ancairon)
- Rewrite collector for Varnish Cache (go.d/varnish) (#18491, @Ancairon)
- Rewrite collector for APC UPSes (go.d/apcupsd) (#18489, @ilyam8)
- Rewrite collector for 1-Wire sensors (go.d/w1sensor) (#18464, @Ancairon)
- Rewrite collector for Samba (go.d/samba) (#18418, @ilyam8)
- Rewrite collector for BOINC (go.d/boinc) (#18398, @ilyam8)
Improvements
- Add zone label to mem fragmentation chart (debugfs.plugin) (#18910, @ilyam8)
- Add Pod status reason chart (go.d/k8s_state)(#18887, @ilyam8)
- Improve container warning/terminated reason charts (go.d/k8s_state) (#18885, @ilyam8)
- Add tini to Linux managers (apps.plugin) (#18856, @ilyam8)
- Add NUMA node memory activity chart (proc.plugin) (#18855, @ilyam8)
- Make timeout and interval configurable for network listeners discovery (go.d.plugin) (#18847, @ilyam8)
- Add “Queued PUBLISH Messages” chart (go.d/vernemq) (#18838, @ilyam8)
- Add NUMA nodes memory usage chart (proc.plugin) (#18822, @ktsaou)
- Refactor vernemq: support prometheus namespace added in v2.0 (go.d/vernemq) (#18815, @ilyam8)
- Add storage (disk) metrics (windows.plugin) (#18810, #18824, @ktsaou)
- Add VerneMQ to apps_groups.conf (apps.plugin) (#18802, @ilyam8)
- Add support for querying archived files (systemd-journal.plugin) (#18792, @ktsaou)
- Add network interfaces charts and alerts (windows.plugin) (#18785, @ktsaou)
- Add NetFramework charts (windows.plugin) (#18762, @thiagoftsm)
- Add model_number label to charts (go.d/nvme) (#18741, @ilyam8)
- Add support for controller ROC temperature (go.d/storcli) (#18732, @eatnumber1)
- Add a config option to add/update sensor label value (go.d/sensors) (#18707, @ilyam8)
- Add NTP packets chart (go.d/chrony) (#18685, @ilyam8)
- Improve interpreters and managers pattern matching; support win services international names (apps.plugin) (#18673, @ktsaou)
- Add OracleDB to apps_groups.conf (apps.plugin) (#18666, @ilyam8)
- Various improvements (apps.plugin) (#18652, @ktsaou)
- Implement Windows support (apps.plugin) (#18594, @ktsaou)
- Add MSSQL charts (windows.plugin) (#18591, #18689 @thiagoftsm)
- Refactor sensors: use sysfs interface only and collect more metrics (go.d/sensors) (#18581, @ilyam8)
- Implement UDP port check (go.d/portcheck) (#18569, @ilyam8)
- Add IIS charts (windows.plugin) (#18566, @thiagoftsm)
- Add “label_prefix” config option (go.d/prometheus) (#18559, @ilyam8)
- Add NGINX Unit to apps_groups.conf (apps.plugin) (#18557, @ilyam8)
- Add Typesense to apps_groups.conf (apps.plugin) (#18537, @ilyam8
- Add TCPv4/TCPv6/ICMP errors charts (windows.plugin) (#18526, @stelfrag)
- Add sys info labels (go.d/snmp) (#18523, #18527, #18529, #18530 @ilyam8)
- Add an option to automatically create vnode (go.d/snmp) (#18520, @ilyam8)
- Add docker support (go.d/varnish) (#18512, @ilyam8)
- Add function to execute commands inside Docker containers (go.d.plugin) (#18509, @ilyam8)
- Add Thermal Zone and swap charts (windows.plugin) (#18494, @thiagoftsm)
Bug fixes
- Fix parsing power average_max value (go.d/sensors) (#18806, @ilyam8)
- Fix wrong UPS load value (go.d/apcupsd) (#18780, @ilyam8)
- Fix debug msg spam on macOS and freeBSD (apps.plugin) (#18743, @ilyam8)
- Fix parsing power accuracy value (go.d/sensors) (#18735, @ilyam8)
- Fix container name resolution for containers without env variables (cgroups.plugin) (#18691, @daniel-sampliner)
Other
- Log as info if directory doesn’t exists (proc.plugin) (#18909, @ilyam8)
- Add build tags to modules (go.d.plugin) (#18900, @ilyam8)
- Remove python.d/zscores (#18897, @ilyam8)
- Remove python.d/spigotmc (#18889, @ilyam8)
- Fix storage charts gaps on Windows (windows.plugin) (#18880, @ktsaou)
- Fix plugin exit if no python interpreter found (python.d.plugin) (#18747, @ilyam8)
- Allow parents to identify the children (apps.plugin) (#18734, @ktsaou)
- Disable plugin if all events disabled during init (perf.plugin) (#18728, @ilyam8)
- Print the original comm in debug mode (apps.plugin) (#18727, @ktsaou)
- Stop checking UDP ports on ICMP listen error (go.d/portcheck) (#18721, @ilyam8)
- Use lib function to check if stderr connected to journal (go.d.plugin) (#18718, @ilyam8)
- Fix uptime on Windows (apps.plugin)(#18662, @ktsaou)
- Fix sprig funcmap (go.d.plugin) (#18658, @ilyam8)
- Remove python.d/oracledb (#18651, @Ancairon)
- Remove duplicate chart check in tests (go.d.plugin) (#18650, @ilyam8)
- Fix FreeBSD cpu calculation (apps.plugin) (#18648, @ktsaou)
- Cleanup pkg/socket (go.d.plugin) (#18633, @ilyam8)
- Remove python.d/openldap (#18626, @Ancairon)
- Remove python.d/ceph (#18584, @Ancairon)
- Restructure packages (go.d.plugin) (#18580, @ilyam8)
- Improve status duration calculation (go.d/portcheck) (#18577, @ilyam8)
- Add tabs to config schema (go.d/portcheck) (#18575, @ilyam8)
- Rename example module to testrandom (go.d.plugin) (#18561, @ilyam8)
- Fix Goland code inspection warnings (go.d.plugin) (#18552, @ilyam8)
- Simplify HTTP request code (go.d.plugin) (#18546, @ilyam8)
- Cleanup web pkg (go.d.plugin) (#18545, #18544 @ilyam8)
- Add vnode guid validation (go.d/snmp) (#18531, @ilyam8)
- Add
varnishstatandvarnishadmto ndsudo (#18503, @ilyam8) - Remove python.d/varnish (#18499, @ilyam8)
- Remove Warnings (ebpf.plugin) (#18484, @thiagoftsm)
- Remove charts.d/apcupsd (#18481, @ilyam8)
- Remove python.d/w1sensor (#18471, @Ancairon)
- Exit slabinfo.plugin on EPIPE (#18448, @teqwve)
- Improve lmsensors performance (go.d/sensors) (#18429, @ilyam8)
- Vendor https://github.com/mdlayher/lmsensors (#18427, @ilyam8)
- Remove charts.d/sensors (#18426, @ilyam8)
- Add
smbstatus -Pto ndsudo (#18414, @ilyam8) - Remove python.d/sambsa (#18413, @ilyam8)
- Remove python.d/anomalies (#18402, @ilyam8)
- Remove python.d/boinc (#18397, @ilyam8)
Packaging/Installation
All changes
- Correct go.d.plugin permission for source builds (#18876, @ilyam8)
- Fix installing libcurl_dev on FreeBSD (#18845, @ilyam8)
- Fix broken claiming via kickstart on some systems (#18789, @Ferroin)
- Add a basis for MSI installer (#18787, @vkalintiris)
- Ensure
--non-interactiveflag is passed during self-update (#18786, @ilyam8) - Add Ubuntu 24.10 and Fedora 41 to CI (#18753, @Ferroin)
- Update go toolchain to v1.22.8 (#18659, @ilyam8)
- Improve windows installer (#18649, @thiagoftsm)
- Publish Windows installers on nightly builds (#18603, @Ferroin)
- Fix creation of claim.conf in kickstart.sh (#18587, @eya46)
- Fetch metadata by hash for DEB repos (#18536, @Ferroin)
- Assorted build cleanup for external data collection plugins (#18501, @Ferroin)
- Un-vendor proprietary dashboard code (#18437, @Ferroin)
- Fix creation of claim.conf when running kickstart.sh e as a regular user (#18406, @ilyam8)
Documentation
All changes
- Fix broken links in collectors metadata (#18915, @ilyam8)
- Update uninstallation docs and remove reinstallation page (#18907, @Ancairon)
- Update maintenance docs (#18898, #18895, #18894 @Ancairon)
- Clarify required permissions in go.d/ping readme (#18868, @ilyam8)
- Make integration links absolute (#18851, @Ancairon)
- Remove legacy dashboard description (#18841, @ilyam8)
- Update enterprise SSO docs (#18836, @car12o)- Add the Windows event logs integration to the meta (#18829, @Ancairon)
- Fix grammar in readme (#18799, @ilyam8)
- Add ref to dyncfg to configuration readme (#18793, @Ancairon)
- Document ML enabled
auto(#18784, @stelfrag) - Fix a typo in readme (#18781, @BobConanDev)
- Fix pattern example in apps.plugin readme(#18742, @ilyam8)
- Document
ilertCloud notification integration (#18736, @car12o) - Add instructions to configure SCIM integration in Okta (#18710, @juacker)
- Improve apps.plugin readme (#18705, @ilyam8)
- Update windows documentation (#18703, @Ancairon)
- Various grammar and format fixes (#18670, @Ancairon)
- Remove “How to write a new module” from python plugin readme (#18669, @Ancairon)
- Various grammar and format fixes for
packaging/docs (#18665, @Ancairon) - Add FAQ to SCIM integration doc (#18664, @juacker)
- Various grammar and format fixes for
docsdocs (#18660, @Ancairon) - Update wording about the
edit-configscript (#18639, @Ancairon) - Add hardware requirements for on-prem installation (#18608, @M4itee)
- Fix some documentation issues identified by Goland code inspection (#18553, @ilyam8)
- Improve netdatacli docs (#18518, @ilyam8)
Other Notable Changes
Improvements
- Optimize local-listeners for servers with a large number of sockets (#18798, #18807, #18820 @ktsaou)
- Implement logging to Windows Event Log (#18688, @ktsaou)
- Add UTF8 support for chart ids, names and other metadata (#18684, @ktsaou)
- Add ilert Agent notification method (#18447, @DaTiMy)
- Implement stream paths propagation to children and parents (#18430, @ktsaou)
- Add configuration parsers for duration and size (#17238, @ktsaou)
Bug Fixes
Other
- Fix a potential invalid double-free memory (#18905, @stelfrag)
- Implement versioning for functions (#18902, @ktsaou)
- Fix coverity issues (#18896, #18892 @stelfrag)
- Fix config parsing memory leaks in log2journal (#18893, @ktsaou)
- Include windows.h globally in libnetdata (#18878, @ktsaou)
- Correct health schema typo preventing Action alert rendering. (#18871, @ilyam8)
- Adjust text_sanitizer to accept the default value (#18870, @stelfrag)
- Do not build H2O by default (#18861, @vkalintiris)
- Silence up-to-date installation targets (#18842, @vkalintiris)
- Remove RRDSET_FLAG_DETAIL (#18837, @vkalintiris)
- Fix “invalid magic” issue in spawn-server-nofork (#18831, @ktsaou)
- Remove old obsolete check for excess data in request (#18830, @ktsaou)
- Add common O/S Caching Layer for users and groups (#18825, @ktsaou)
- Fix compilation on Windows (#18823, @ktsaou)
- Claiming should wait for node id and status ONLINE only (#18816, @ktsaou)
- Comment out dictionary with hashtable code for now (#18814, @stelfrag)
- Fix variable scope to prevent invalid memory access (#18813, @stelfrag)
- Aesthetic changes in the code (#18808, @ktsaou)
- Calculate currently collected metrics (#18803, @stelfrag)
- Windows fixes (chart labels and warnings) (#18796, @ktsaou)
- Do not load/save context data in RAM mode (#18790, @stelfrag)
- Unify claiming response json (#18777, @ktsaou)
- Sqlite upgrade to version 3.46.1 (#18772, @stelfrag)
- Close all open fds on callback (#18764, @ktsaou)
- Expand ml enabled option (#18761, @stelfrag)
- Add an option to run without libmnl to local-listeners (#18759, @ktsaou)
- Fix crash on agent initialization (#18746, @stelfrag)
- Sanitizers should not remove trailing underscores (#18738, @ktsaou)
- Reset the log sources to apply user selection (#18725, @ktsaou)
- Fix logs POST query payload parsing (#18722, @ktsaou)
- Delay child disconnect update (#18712, @stelfrag)
- Load chart labels on demand (#18699, @stelfrag)
- Fixes a permission issue with the
cgroup-network-helper.shscript (#18694, @ilyam8) - Fix sanitization issues (#18687, @ktsaou)
- Send node info update after ACLK connection timeout (#18683, @stelfrag)
- Add spawn server to cgroup-network (#18674, @ktsaou)
- Properly set start/shutdown times to parent/child (#18668, @stelfrag)
- Fix node info (api/v1/info) on Windows (#18656, @thiagoftsm)
- Handle MQTT ping timeouts (#18653, @stelfrag)
- Reorganize top-level headers in libnetdata (#18643, @vkalintiris)
- Update collectors/common-contexts file names (#18638, @vkalintiris)
- Move plugins.d directory outside of collectors (#18637, @vkalintiris)
- Log Agent start/stop timing events (#18632, @stelfrag)
- Change default pages per extent (#18623, @stelfrag)
- Cleanup MQTT related code (#18622, @stelfrag)
- Fixes to POST Functions (#18611, @ktsaou)
- Retry sending data when errno is EAGAIN (#18607, @ktsaou)
- Add DLLs to CmakeLists.txt (#18590, @thiagoftsm)
- Use node_id when available, otherwise host_id in weights query (#18579, @ktsaou)
- Add CPU model host label to Prometheus export (#18562, @ilyam8)
- Misc code cleanup (#18540, @stelfrag))
- Windows Events: recalculate the length of returned Unicode strings every time (#18525, @ktsaou)
- Remove save-database from netdatacli usage (#18519, @ilyam8)
- Serve dashboard v3 static files when available (#18507, @ktsaou)
- Fix installed ram calculation in node info (api/v1/info) on Windows (#18482, @ilyam8)
- Add x-netdata-auth and x-transaction-id to Access-Control-Allow-Headers(#18477, #18478,#18479 @ktsaou)
- Prevent sigsegv in config-parsers (#18476, @ktsaou)
- Add version to systemd-journal info response (#18474, @ktsaou)
- Fix node index in alerts (#18469, @stelfrag)
- Fix parsing url arg in netdata-claim.sh (#18460, @ilyam8)
- Fix invalid precedence when calculating time_to_evict (#18444, @stelfrag)
- Do not free the sender when the sender thread exits (#18441, @ktsaou)
- Fix deadlock in streaming (#18438, #18440, @ktsaou)
- Improve agent shutdown time (#18434, @stelfrag)
- Remove checks.plugin dir (#18424, @ilyam8)
- Cleanup ACLK code (#18417, @stelfrag)
- Restore /api/v1/badge.svg (#18416, @ktsaou)
- Re-evaluate signals every 500 ms in spawn-server (#18411, @ktsaou)
- Remove pyyaml2 (#18404, @ilyam8)
- Use existing ACLK event loop for Cloud queries (#18218, @stelfrag)
Deprecation notice
v1 and v2 APIs, v0, v1, v2 Dashboards
This release of Netdata is the last supporting the old v1 and v2 APIs and support for the old v0, v1 and v2 dashboards. Starting with the next major release of Netdata, only the v3 API and the v3 Dashboard will be available.
Changed in this release: Rewritten/Changed collectors
Important Note: While most users won’t experience any disruption, there’s one key point to consider if you’re heavily relying on Netdata’s metrics, such as exporting data to an external Time-Series Database (TSDB) or creating custom alerts. Due to the rewrite/refactor, the names of some metrics in some collectors have been changed. You may need to update your configurations to reflect the new metric names to avoid disruptions in your workflows.
To view a complete list of all metrics collected by a specific collector, refer to its documentation.
Rewritten Collectors: we have rewritten a number of collectors from Python (python.d.plugin) to Go (go.d.plugin).
| Deprecated Collector (Python) | Replacement Collector (Go) |
|---|---|
| python.d/apcupsd | go.d/apcupsd |
| python.d/boinc | go.d/boinc |
| python.d/ceph | go.d/ceph |
| python.d/openldap | go.d/openldap |
| python.d/oracledb | go.d/oracledb |
| python.d/samba | go.d/samba |
| python.d/spigotmc | go.d/spigotmc |
| python.d/varnish | go.d/varnish |
| python.d/w1sensor | go.d/w1sensor |
Refactored Collectors
| Collector | Change |
|---|---|
| go.d/sensors | Now uses sysfs interface only. Breaking change to metrics name. |
| go.d/vernemq | Now supports prometheus namespace added in v2.0. Breaking change to metrics name. |
Removed Collectors
| Removed Collector (Python) | Reason |
|---|---|
| python.d/zscores | Unnecessary due to built-in anomaly detection. Unmaintained and uses unmaintained third-party library. |
Support options
As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata Agent, feel free to contact us through one of the following channels:
- Premium Support: Customers who wish to have a direct channel with Netdata and prioritized support with defined SLAs can contact us.
- Netdata Learn: Find documentation, guides, and reference material for monitoring and troubleshooting your systems with Netdata.
- GitHub Issues: Make use of the Netdata repository to report bugs or open a new feature request.
- GitHub Discussions: Join the conversation around the Netdata development process and be a part of it.
- Community Forums: Visit the Community Forums and contribute to the collaborative knowledge base.
- Discord Server: Jump into the Netdata Discord and hang out with like-minded sysadmins, DevOps, SREs, and other troubleshooters. More than 2000 engineers are already using it!