Box Case Study

Box (Nasdaq:BOX) offers a hosted storage service and headquartered in Redwood City California. Box been using Sensu successfully for several years across multiple datacenters for monitoring over 30k hosts. In this case study, Trent Baker, Senior SRE at Box describes their initial journey of replacing their legacy Nagios with Sensu across the enterprise.

IT Environment

The IT environment included authentication, multiple DNS solutions (bind, infoblox), configuration management (Puppet and Chef) and storage supporting over 80,000 paid enterprise users and 11 million individual users.

The Problem

The IT team was struggling to scale their existing Nagios deployment. They had deployed multiple Nagios slaves, and using Puppet as the service discovery tool which took hours and multiple puppet runs to propagate through their infrastructure. Any changes or decommissioning a server took hours to propagate through the system and any error caused alert storms.

Having such a fragile monitoring platform was preventing the team from progressing towards a modern cloud environment.

The Solution

Box replaced Nagios with two Sensu HA clusters in their datacenters. Sensu was designed to integrate seamlessly with configuration management tools such as Puppet, Chef & Ansible. All the native Nagios checks could be used as-is in their new Sensu deployment and all the Nagios classes mapped to Sensu subscriptions, saving Box precious time and resources.

Sensu was very easy to deploy and they rolled it out to 5 different datacenters. It scaled horizontally to their entire infrastructure and they were able to use Sensu’s Wavefront integration to store metrics using Sensu as a pipeline. Additionally, the APIs allowed using Sensu as a trigger for auto-remediation of outages.

In the end, being able to have a modern and flexible monitoring platform such as Sensu allowed Box to look ahead to modernizing their IT infrastructure to use containers & hybrid cloud environments.

The Nagios Success Story with Trent Baker

Trent Baker talks about the process his team took while migrating 350K objects off of Nagios using Sensu at