Netdata logo

Netdata

Real-time infrastructure and application monitoring platform

Alternative to: Prometheus, Grafana, Nagios, Zabbix, Datadog, New Relic, Sensu, Dynatrace

Netdata screenshot

About Versions (102)

v1.32.0

2021-11-30

Release v1.32.0

The newest version of Netdata, v.1.32.0, propels us toward the end of the year, and the Netdata community is positioned to grow stronger than ever in 2022. Before we get into specifics of the new release, it’s worth reflecting on that growth.

Netdata open-source Agent growth

The open-source Netdata Agent, the best OSS node monitoring and troubleshooting ever, currently has:

  • 1,000,000 unique Netdata nodes live!
  • 330,000 engineers using the agent per month!
  • Our open-source community growing at an amazing rate, with 3,000 new nodes and 8,000 users per day!
  • 250,000 Docker pulls per day with 360 million total, according to DockerHub!

Netdata Cloud growth

The Netdata Cloud, our infrastructure-level, distributed, real-time monitoring and troubleshooting orchestrator, is also showing similar growth, with:

  • 35,000 live Netdata nodes!
  • 90,000 engineers signed up with 200 new sign-ups every day!
  • 180 new spaces created every day!

We are not just pleased with this amazing adoption rate, we are inspired by it. It is you users who give us the energy and confidence to move forward into a new era of high-fidelity, real-time monitoring and troubleshooting, made accessible to everyone!

Thank you for the inspiration! You rock!

Community News

As many of you know, even though we are not endorsed by CNCF, Netdata is the fourth most starred project in the CNCF landscape. We want to thank you for this expression of your appreciation. If you love Netdata and haven’t yet, consider giving us a Github star.

Additionally, we invite you to join us on our new Discord server to continue our growth and trajectory, but also to join in on fun and informative live conversations with our wonderful community.

v1.32.0 at a glance

The following offers a high-level overview of some of the key changes made in this release, with more detailed description available in subsequent sections.

New Cloud backend and Agent communication protocol This Agent release supports our new Cloud backend. From here, we will be offering much faster and simpler communication, reliable alerts and exchange of metadata, and first-time support for the parent-child relationship of Netdata agents. This is the first Agent release that allows Netdata Cloud to use the Netdata Agent as a distributed time-series database that supports replication and query routing, for every metric!

eBPF latency monitoring, container monitoring, and more We use eBPF to monitor all running processes, without the cooperation of the processes and without sniffing data traffic. This new release includes 13 new eBPF monitoring features, including I/O latency, BTRFS, EXT4, NFS, XFS and ZFS latencies, IRQs latencies, extended swap monitoring, and more.

Machine learning (ML) powered anomaly detection ​This release links Netdata Agent with dlib, the popular C++ machine learning algorithms library, which we use to automatically detect anomalies out-of-the-box, at the edge! Once enabled, Netdata trains an ML model for every metric, which is then used to detect outliers in real-time. The resulting “anomaly bit” (where 0=normal, 1=anomalous) associated with each database entry is stored alongside the raw metric value with zero additional storage overhead! This feature is still in development, so it is disabled by default. If you would like to test it and provide feedback, you can enable the feature using the instructions provided in the Detailed release highlights section.

New timezone selector and time controls in the user interface We implemented a new timezone picker and time controls to enhance administrative abilities in the dashboard.

Docker image POWER8+ support Netdata Docker images now support recent IBM Power Systems, Raptor Talos II, and more.

And more… Four new collectors, 112 total improvements, 95 bug fixes, 49 documentation updates, and 57 packaging and installation changes!

Detailed release highlights

New Cloud backend and Agent communication protocol

It’s no secret that the best of Netdata Cloud is yet to come. After several months of developing, testing, and benchmarking a new architectural system, we have steadied ourselves for that growth. These changes should offer notable and immediate improvements in reliability and stability, but more importantly, they allow us to quickly and efficiently develop new features and enhanced functionality. Here’s what you can look for on the short-term horizon, thanks to our new architecture:

  • Greater capacity: The new architecture will change the communication protocol between the Agent and the Cloud to be incremental, improving our agent-handling capacity by ensuring that the Cloud uses measurably less bandwidth.
  • Parent/child relationships: The new architecture will allow, for the first time, the recognition of parent child relationships in the Cloud. These changes will enable you to change storage configuration on parents, limit sent metrics, and reduce data frequency to achieve a longer data retention for your nodes. Atop of this, we will continue to develop the ability for you to have complex setups to scale your monitoring with parents as proxies. Ultimately, this will enable Netdata to operate as a headless connector with the lowest footprint possible on your production nodes.
  • Alerts: The new architecture will host a multitude of improvements on our alerts presentation over the coming months, allowing for enhanced reliability, alert management, alert logs to be collected in the Cloud, and more.

If you would like to be among the first to test this new architecture and provide feedback, first make sure that you have installed the latest Netdata version following our guide. Then, follow our instructions for enabling the new architecture.

eBPF container monitoring

We did a lot of work to enhance our eBPF container monitoring this release. First, we start with the development of full eBPF support for cgroups. As a refresher on just how important this update is: cgroups together with Namespaces are the building blocks for containers, which is the dominant way of distributing monitoring applications. We use cgroups to control how much of a given key resource (CPU, memory, network, and disk I/O) can be accessed or used by a process or set of processes. Our eBPF collector now creates charts for each cgroup, which enables us to understand how a specific cgroup interacts with the Linux kernel! 🤓

This enhances our already extensive monitoring by including cgroups for mem, process, network, file access, and more.

eBPF latency monitoring

By enabling eBPF monitoring on all systems that support it, Netdata has already been established as a world-leading distributor of eBPF! We use eBPF to monitor all running processes, without the cooperation of the processes, by tracking any way the application interfaces with the system. And in this release, we continue our commitment to further improve eBPF by tracking latencies by disks, IRQs, etc.

Our new eBPF latency features include:

  • A new set of Disk I/O latency charts, which monitor the time that it takes for an I/O request to complete. As many of you may know, this is the most important metric for storage performance!
  • Latency IRQs monitoring to help anyone with time spent servicing interrupts (hard or soft).
  • A new Filesystem submenu that adds latency monitoring for different filesystems: BTRFS, Ext4, NFS, XFS and ZFS. The latency monitoring was brought for the most common functions, like latency for each open request and latency for each sync request.

eBPF is a very strong addition to our monitoring tools, and we are committed to provide the best experience with monitoring with eBPF from a distance without disrupting the data flow!

Other eBPF enhancements

But we didn’t stop there with eBPF in v1.32.0. We also provided the following updates:

  • We moved VFS to a Filesystem menu to simplify the visualization of events realized by filesystems. This allows you to monitor actions of filesystems and their latency.
  • Until now, Netdata had metrics that demonstrated the amount of swap usage. eBPF.plugin now extends the swap monitoring to show how a specific application group/cgroup is performing action on SWAP.
  • We have improved process management monitoring by adding monitoring to shared memory and using tracepoints to monitor process creation and exit with more accuracy.
  • Netdata also brings monitoring for OOM Kill events for each apps groups defined on host.

If you share our interest in eBPF monitoring, or have questions or requests, feel free to drop by our Community forum to start a discussion with us.

Machine learning (ML) powered anomaly detection

Machine learning (ML) is undeniably a wave of the future in monitoring and troubleshooting. The Netdata community is riding that wave forward together, ahead of everyone else. Netdata v.1.32.0 introduces some foundational capabilities for ML-driven anomaly detection in the agent. We have integrated the popular dlib c++ ml library to power unsupervised anomaly detection out-of-the-box.

While this functionality is still under development and subject to change, we want to develop this with you, as a team. The functionality is disabled by default while we dogfood the feature internally and build additional ML-leveraging features into Netdata Cloud. But you can go to the new [ml] section in netdata.conf and set enabled=yes to turn on anomaly detection. After restarting Netdata, you should see the Anomaly Detection menu with charts highlighting the overall number and percent of anomalous metrics on your node. This can be a very useful single number summary of the state of your node.

Share your feedback by emailing us at analytics-ml-team@netdata.cloud or just come hang out in the 🤖-ml-powered-monitoring channel of our discord, where we discuss all things ML and more!

And then, be on the lookout for some bigger announcements and launches relating to ML over the next couple of months.

New timezone selector and time controls in the user interface

Collaborating in a remote world across regions can be difficult, so we wanted to make it easier for you to sync with your administrative teams and your system information. Our new timezone selector allows you to select a timezone to accommodate collaboration needs within your teams and infrastructure. Additionally, we have added the following time controls to allow you to distinguish if the content you are looking at is live or historical and to refresh the content of the page when the tabs are in the background:

  • Play: When this option is selected, the content of the page will be automatically refreshed while this is in the foreground.
  • Pause: When this option is selected, the content of the page will not refresh due to a manual request to pause it or, for example, when you are investigating data on a chart (cursor is on top of a chart)
  • Force Play: When this option is selected, the content of the page will be automatically refreshed even if this is in the background.

Docker image POWER8+ support

And on top of all of that, we have added 64-bit little-endian POWER8+ support to our official Docker images, allowing the use of Netdata Docker images on recent IBM Power Systems, Raptor Talos II, and similar POWER based hardware, extending the list of what is currently supported for our Docker images, which includes:

  • 32 and 64 bit x86
  • ARMv7
  • AArch64

Acknowledgments

  • @nabijaczleweli for fixing writing updater log under root.
  • @MikaelUrankar for fixing calculation of sysctl mib size in freebsd plugin.
  • @filip-plata for adding additional metrics to python.d/postgres collector.
  • @eltociear for fixing typos.
  • @gotjoshua for adding a link to python.d/httpcheck.conf.
  • @wangpei-nice for fixing ebpf.plugin segfault when ebpf_load_program returns null pointer.
  • @zanechua for adding Microsoft Teams to supported notification endpoints.
  • @diizzyy for adding support for Intel 2.5G and Synopsys DesignWare nic driver in freebsd plugin.
  • @Saruspete for fixing handling of adding slabs after discovery in slabinfo plugin.
  • @mjtice for adding autovacuum and tx wraparound charts to python.d/postgres.
  • @charoleizer for adding PostgreSQL version to requirements section.
  • @danmichaelo for fixing a typo in exporting docs.
  • @oldgiova for adding capsh check before issuing setcap cap_perfmon.
  • @oldgiova for adding Travis ctrl file for checking if changes happened.
  • @0x3333 for fixing an inconsistent status check in charts.d/apcupsd.
  • @etienne-napoleone for adding terra related binaries to blockchains apps plugin group.
  • @anayrat for fixing postgres replication_slot chart on standby.
  • @vpiserchia for fixing handling of null values returned by _cat/indices API in python.d/elasticsearch.
  • @elelayan for fixing zpool state parsing in proc/zfs.
  • @steffenweber for adding missing privilege to fix MySQL slave reporting.
  • @unhandled-exception for adding sorting of the list of databases in alphabetical order in python.d/postgres.
  • @78Star for updating Netdata and its dependencies versions for pfSense.
  • @unhandled-exception for fixing crashing of the wal query if wal-file was removed concurrently in python.d/postgres.
  • @rupokify for updating jQuery dependency.
  • @caleno for fixing a typo in streaming docs.
  • @rex4539 for fixing typos.

Dashboard


Collectors

New

Improvements

  • Add AWS to apps_groups.conf (#11826, @ilyam8)
  • Show stats for systemd protected mount points (diskspace plugin) (#11767, @vlvkobal)
  • Add support for v1.7.0+ (go.d/coredns) (#619, @georgeok)
  • Add “/basic_status” job nginx.conf (go.d/nginx) (#612, @ilyam8)
  • Add sharding metrics (go.d/mongodb) (#609, @georgeok)
  • Add thread operations metrics (go.d/mysql) (#607, @ilyam8)
  • Add replica sets metrics (go.d/mongodb) (#604, @georgeok)
  • Add databases metrics (go.d/mongodb) (#602, @georgeok)
  • Add more OS(OperatingSystem) charts (go.d/wmi) (#593, @ilyam8)
  • Add caddy job to prometheus.conf (go.d/prometheus) (#581, @odyslam)
  • Add AOF file size metrics (go.d/redis) (#578, @ilyam8)
  • Add openethereum/geth jobs to prometheus.con (go.d/prometheus) (#578, @odyslam)
  • Update whois/whois-parser packages and add timeout configuration option (go.d/whoisquery) (#576, @ilyam8)
  • Disable reporting min/avg/max group uptime by default (apps plugin) (#11609, @ilyam8)
  • Add sorting of the list of databases in alphabetical order (python.d/postgres) (#11580, @unhandled-exception)
  • Add terra related binaries to blockchains group (apps plugin) (#11437, @etienne-napoleone)
  • Add instruction per cycle charts (perf plugin) (#11392, @thiagoftsm)
  • Add autovacuum and tx wraparound charts (python.d/postgres) (#11267, @mjtice)
  • Add support for Intel 2.5G and Synopsys DesignWare nic driver (freebsd plugin) (#11251, @diizzyy)
  • Add web3 and blockchains groups (apps plugin) (#11220, @odyslam)
  • Implement merging user/stock configuration files (python.d plugin) (#11217, @ilyam8)
  • Rename default job from ‘local’ to ‘anomalies’ (python.d/anomalies) (#11178, @andrewm4894)
  • Add standby lag and blocking transactions charts (python.d/postgres) (#11169, @filip-plata)

Bug fixes

  • Fix renaming for cgroups with dots in the path (cgroups plugin) (#11775, @vlvkobal)
  • Fix exiting on SIGPIPE (go.d plugin) (#630, @ilyam8)
  • Fix domain syntax validation (go.d/whoisquery) (#629, @ilyam8)
  • Fix missing NONE in valid request methods (go.d/squidlog) (#621, @ilyam8)
  • Remove wrong “queue_messages_in_queues” chart (go.d/vernemq) (#601, @ilyam8)
  • Fix HTTP/socket client initialization order (go.d/phpfpm) (#591, @ilyam8)
  • Fix scraping metrics when resources are not discovered (go.d/vsphere) (#589, @ilyam8)
  • Fix LTSV log format parsing (go.d/weblog) (#584, @ilyam8)
  • Fix expiration date parsing (go.d/whoisquery) (#575, @ilyam8)
  • Fix containers name resolution for crio/containerd runtime (cgroups plugin) (#11756, @ilyam8)
  • Add sensors to charts.d.conf and add a note on how to enable it (charts.d plugin) (#11715, @ilyam8)
  • Fix crashing of the wal query if wal-file was removed concurrently (python.d/postgres) (#11697, @unhandled-exception)
  • Fix “lsns: unknown column” logging (cgroups plugin) (#11687, @ilyam8)
  • Fix nfsd RPC metrics and remove unused nfsd charts and metrics (proc/nfsd) (#11632, @vlvkobal)
  • Fix “proc4ops” chart family (proc/nfsd) (#11623, @ilyam8)
  • Fix swap size calculation (cgroups plugin) (#11617, @vlvkobal)
  • Fix RSS memory counter for systemd services (cgroups plugin) (#11616, @vlvkobal)
  • Fix VBE parsing (python.d/varnish) (#11596, @ilyam8)
  • Remove unused synproxy chart (proc/synproxy) (#11582, @vlvkobal)
  • Fix zpool state parsing (proc/zfs) (#11545, @elelayan)
  • Fix null values returned by ‘_cat/indices’ API (python.d/elasticsearch) (#11501, @vpiserchia)
  • Fix replication_slot chart on standby (python.d/postgres) (#11455, @anayrat)
  • Fix an inconsistent status check (charts.d/apcupsd) (#11435, @0x3333)
  • Fix plugin name (stats.d plugin) (#11400, @vlvkobal)
  • Fix plugin names (freebsd and macos plugins) (#11398, @vlvkobal)
  • Fix lack of “module” in chart definition (all chart.d modules) (#11390, @ilyam8)
  • Fix various python modules charts contexts (python.d/smartd_log, mysql, zscores) (#11310, @ilyam8)
  • Fix current operation charts title and context (proc/mdstat) (#11289, @ilyam8)
  • Fix handling of adding slabs after discovery (slabinfo plugin) (#11257, @Saruspete)
  • Fix calculation of sysctl mib size (freebsd plugin) (#11159, @MikaelUrankar)

eBPF

New

Improvements

Bug fixes


Health

Improvements

Bug fixes


Documentation

Packaging / Installation

Other Notable Changes

Improvements

Bug fixes

Deprecation notice

An upcoming stable release of the Netdata agent will include a maintainability update to our base Docker image. A small percentage of users will find that all self-compiled packages must be manually rebuilt after the update, even if relocation/SONAME errors are not encountered. --security-opt=seccomp=unconfined can be passed with no default.json, but this introduces security vulnerabilities between the host and malicious code in the container.

Alternatively, users can prepare for the update by upgrading to one of the following:

  • runc v1.0.0-rc93
  • Docker 19.03.9 or greater AND libseccomp 2.4.2 or greater

While Netdata previously avoided making this update to minimize inconvenience to our users, we are now facing a third-party end-of-life date, and we believe the minimal number of affected users substantiates the need for the change.

Additionally, in a future stable release, we will be removing our legacy agent-to-cloud connection. Most users should see no change in this upgrade, but we will lose SOCKS 5 proxy support for the Netdata Cloud functionality, which will affect a small number of users.

Support options

As we grow, we stay committed to providing the best support ever seen from an open-source solution. Should you encounter an issue with any of the changes made in this release or any feature in the Netdata agent, feel free to contact us by one of the following channels:

  • Github: You can use our Github repo to report bugs and submit feature requests
  • Community forum: You can visit our community forum for questions and training.
  • NEW: Discord: You can jump into our Discord for interactive, synchronous help and discussion. More than 700 engineers are already using it! Join us!