Netdata
Real-time infrastructure and application monitoring platform
Alternative to: Prometheus, Grafana, Nagios, Zabbix, Datadog, New Relic, Sensu, Dynatrace
v1.19.0
2019-11-28Netdata v1.19.0
Release v1.19.0 contains 2 new collectors, 19 bug fixes, 17 improvements, and 19 documentation updates.
At a glance
We completed a major rewrite of our web log collector to dramatically improve its flexibility and performance. The new collector, written entirely in Go, can parse and chart logs from Nginx and Apache servers, and combines numerous improvements. Netdata now supports the LTSV log format, creates charts for TLS and cipher usage, and is amazingly fast. In a test using SSD storage, the collector parsed the logs for 200,000 requests in about 200ms, using 30% of a single core.
This Go-based collector also has powerful custom log parsing capabilities, which means we’re one step closer to a generic application log parser for Netdata. We’re continuing to work on this parser to support more application log formatting in the future.
We have a new tutorial on enabling the Go web log collector and using it with Nginx and/or Apache access logs with minimal configuration. Thanks to Wing924 for starting the Go rewrite!
We introduced more cmocka unit testing to Netdata. In this release, we’re testing how Netdata’s internal web server processes HTTP requests—the first step to improve the quality of code throughout, reduce bugs, and make refactoring easier. We wanted to validate the web server’s behavior but needed to build a layer of parametric testing on top of the CMocka test runner. Read all about our process of testing and selecting cmocka on our blog post: Building an agile team’s ‘safety harness’ with cmocka and FOSS.
Netdata’s Unbound collector was also completely rewritten in Go to improve how it collects and displays metrics. This new version can get dozens of metrics, including details on queries, cache, uptime, and even show per-thread metrics. See our tutorial on enabling the new collector via Netdata’s amazing auto-detection feature.
We fixed an error where invalid spikes appeared on certain charts by improving the incremental counter reset/wraparound detection algorithm.
Netdata can now send health alarm notifications to IRC channels thanks to Strykar!
And, Netdata can now monitor AM2320 sensors, thanks to hard work from Tom Buck.
Acknowledgements
Our thanks go to:
- andyundso for fixing the packagecloud binary installation in Debian 8.
- Strykar for adding support IRC health notifications.
- tommybuck for the new AM2320 sensors collector.
- Saruspete for the new ability to provide metrics on fragmentation of free memory pages.
- OdysLam for improving the documentation for new collector plugins.
- k0ste, xginn8 and nodiscc for improving the configuration of the apps plugin.
- amichelic for improving the web_log collector.
- cherouvim, arkamar, half-duplex and CtrlAltDel64 for improving the documentation.
- mniestroj for the fix to the dbengine compilation with musl standard C.
- arkamar for an improvement to the xenstat collector.
- vakartel for improving the cgroup network interfaces detection in Proxmox 6.
Improvements
New Collectors
- AM2320 sensor collector plugin #7024 (tommybuck)
- Added parsing of /proc/pagetypeinfo to provide metrics on fragmentation of free memory pages. #6843 (Saruspete)
- The unbound collector module was completely rewritten, in Go go.d.plugin/#287 (ilyam8)
Collector improvements
- We rewrote our web log parser in Go, drastically improving its flexibility and performance. go.d.plugin/#141 (ilyam8)
- The Kubernetes kubelet collector now reads the service account token and uses it for authorization. We also added a new default job to collect metrics from
https://localhost:10250/metrics. go.d.plugin/#285 - Added a new default job to the Kubernetes coredns collector to collect metrics from
http://kube-dns.kube-system.svc.cluster.local:9153/metrics. go.d.plugin/#285 - apps.plugin: Synced FRRouting daemons configuration with the frr 7.2 release. #7333 (k0ste)
- apps.plugin: Added process group for git-related processes. #7289 (nodiscc) -apps.plugin: Added balena to the container-engines application group. #7287 (xginn8)
- web_log: Treat 401 Unauthorized requests as successful. #7256 (amichelic)
- xenstat.plugin: Prepare for xen 4.13 by checking for
check xenstat_vbd_errorpresence. #7103 (arkamar) - mysql: Added galera
cluster_statusalarm. #6989 (ilyam8)
Metrics Database
Health
- Fine tune various default alarm configurations. #7322 (Ferroin)
- Update SYN cookie alarm to be less aggressive. #7250 (Ferroin)
- Added support for IRC alarm notifications #7148 (Strykar)
Installation/Packages
- Corrected the Makefile.am files indentation, to prevent unexpected errors. #7252 (knatsakis)
- Rationalized ownership and permissions of
/etc/netdata. #7244 (knatsakis) - Made various improvements to the installer script
netdata-installer.sh. #7200 (knatsakis) - Include go.d.plugin version v0.11.0 #7365 (ilyam8)
Documentation
- Correct versions of FreeNAS that Netdata is available on. #7355 (knatsakis)
- Update plugins.d/README.md. #7335 (OdysLam)
- Note regarding stable vs nightly was accidentally being shown as a code fragment in the installation documentation. #7330 (cakrit)
- Properly link to translated documents from netdata-security.md. #7343 (cakrit)
- Update documentation of the netdata-updater, to properly cover
kickstart-static64.shandkickstart.shinstallations. #7262 (knatsakis) - Converted the swagger documentation to OpenAPI3.0. #7257 (amoss)
- Minor corrections to the netdata installer documentation. #7246 (paulkatsoulakis)
- Fix typo in collectors README. #7242 (cherouvim)
- Clarified database engine/RAM in getting started guide. #7225 (joelhans)
- Suggest using
/var/run/netdatafor the unix socket, in running behind nginx documentation. #7206 (CtrlAltDel64) - Added GA links to new documents. #7194 (joelhans)
- Added a page for metrics archiving to TimescaleDB. #7180 (joelhans)
- Fixed typo in the
contrib/debiandescriptions forcupsd. #7154 (arkamar) - Added user information to MySQL Python module documentation. #7128 (prhomhyse)
- Document the results of the spike investigation into CMake. #7114 (amoss)
- Fix to docker-compose+Caddy installation. #7088 (joelhans)
- Fixed broken links and added setup instructions for Telegram health notifications. #7033 (half-duplex)
- Minor grammar change in /web/gui documentation #7363 (eviemsrs)
Other
- Improve Travis build warnings (issue #7189). #7312 (amoss)
- cmocka testing for http requests #7308, #7308, #7264 #7210 (amoss and vlvkobal)
- CI/CD: Prevented nightly jobs from timing out #7238, #7214 (knatsakis)
Bug fixes
- Fixed packagecloud binary installation in Debian 8. #7342 (andyundso)
- Fixed missing libraries in certain compilations, by adding missing trailing backslash to
Makefile.am. #7326 (oxplot) - Prevented freezes due to isolated CPUs. #7318 (stelfrag)
- Fixed missing streaming when slave has SSL activated. #7306 (thiagoftsm)
- Fixed error 421 in IRC notifications, by removing a line break from the message. #7243 (thiagoftsm)
proc/pagetypeinfocollection could under particular circumstances cause high CPU load. As a workaround, we disabledpagetypeinfoby default. #7230 (vlvkobal)- Fixed incorrect memory allocation in
procplugin’spagetypeinfocollector. #7187 (thiagoftsm) - Eliminated cached responses from the postgres collector. #7228 (ilyam8)
- rabbitmq: Fixed
"disk_free": "disk_free_monitoring_disabled"error. #7226 (ilyam8) - Fixed build with musl standard C library by including
limits.hbefore usingLONG_MAX. #7224 (mniestroj) - Fixed Apache module not working with letsencrypt certificate by allowing the python
UrlServiceto skiptls_verifyfor http scheme. #7223 (ilyam8) - Fixed invalid spikes appearing in certain charts, by improving the incremental counter reset/wraparound detection algorithm. #7220 (mfundul)
- Fixed DNS-lookup performance issue on FreeBSD. #7132 (amoss)
- Fixed handling of the
stableoption, so that the installers and automatic updater respect it. #7083 (knatsakis), #7051 (oxplot) - Fixed handling of the static binary installer’s handling of the
--auto-updateoption. #7076 (knatsakis) - Fixed cgroup network interfaces classification on Proxmox 6. #7037 (vakartel)
- Added missing dbengine flags to the installer. #7027 (paulkatsoulakis)
- Fixed issue with unknown variables in alarm configuration expressions always being evaluated to zero. #6984 (thiagoftsm)
- Fixed issue of automatically picking up Pi-hole stats from a Pi-hole instance installed on another device by disabling the default job that collects metrics from
http://pi.hole. go.d.plugin 289 (ilyam8)