It’s officially been seven years since I started hacking on a monitoring project that would become Sensu. My goal was to replace a fragile Nagios installation that couldn’t keep up with the demands of dynamic public cloud infrastructure — a challenge that still rings true for companies everywhere.
A key part of the Sensu story, as you can see in the very first commit, is our support for Nagios service checks. I think the Nagios service check specification is awesome and underappreciated, so I wanted to share a little more on how and why we support it in Sensu.
Service checks: a universal abstraction for monitoring
For a Nagios service check to be valid, it only needs to fulfill the following two requirements:
- It communicates the status of the service check through one of a set of known return codes.
- It emits at least a single line of output to STDOUT.
That’s it! That’s the whole spec. The set of exit codes used are as follows:
This is the specification upon which thousands of plugins were born.
By relying on these simple, low-level, universal features of operating systems, these plugins can work in any environment and be implemented with any language of your choosing. If you can call it from the command line, exit with a relevant exit code, and emit some text, it can work as a service check and will work with Sensu. By supporting this spec, any Nagios service check just works with Sensu.
Sensu-flavored service checks
Over the years we’ve helped many companies upgrade their monitoring solution for multi-generational datacenters while still running the exact same service checks they’re used to. By learning along with our customers, we’ve seen a few ways we could improve our service checks to make them even better. In addition to the standard Nagios plugin specification, Sensu checks may also:
- Accept input via STDIN
- Emit text to STDERR (in addition to STDOUT)
- Accept command-line arguments that modify plugin behaviour
With these additions, plugins can manage more complex logic, leveraging command-line arguments for configuration, and by accepting input via STDIN, Sensu service checks can be chained in sequence, using the output of one plugin as the input of another for dependent multi-service checks.
Service checks in the wild
The Sensu plugin spec is simple, but that doesn’t mean the service checks are. On any host, with any programming language, you can use or write service checks to validate the behaviour of all sorts of systems.
During an incident, you might want to verify that your database is functioning. Perhaps you SSH into a jump box, load a set of secrets into your session, log into your database using those credentials, and run a simple SELECT and UPDATE query to verify you can read and write data. Whatever you’re doing by hand, you can codify into a service check and have Sensu run it. When those queries fail, you’ll get notified.
Now imagine you wanted to do something like that, but for every server in your datacenter. Maybe you want to know if any process running on any machine in your AWS account is listening on port 23 (telnet), or if any account ran the
su command on any machine in your datacenter. With Sensu service checks, you can use projects like Facebook’s Osquery and it just works. Write a script in any language of your choosing, run a custom query with osquery, and exit with the appropriate return code. BAM! We have complex and performant endpoint visibility with alerting, in just a few lines of code. For more on Sensu + Osquery, check out this blog post, which includes real-world examples and a powerful story on how to turn your ops team’s knowledge into repeatable monitoring.
Tell me your Nagios stories!
I hope you enjoyed learning a little more about why Sensu was designed to support a migration from Nagios. I would love to hear more of your stories: migration tales, how folks are (re)using Nagios checks with Sensu, and more. If you have one you’re willing to share, let us know, and we’d be happy to feature them so that more of our community can understand when Sensu is the right move. In the meantime, thanks for reading, and happy monitoring!