It’s 10 PM, Do You Know How Reliable Your Systems Are?

By Nick Washburn and Andy Fligel

Our Investment in Verica, the Continuous Verification Company

In the Fall of 2019, we were reading O'Reilly Chaos Engineering, the then-current book about the principles of Chaos Engineering first developed by Casey Rosenthal and his team at Netflix. As we digested this introductory book, it became abundantly clear to us that we needed to meet him. Casey previously ran the Traffic & Chaos Team at Netflix, helping to develop the Chaos Automation Platform (ChAP). Beyond that, he and his team were ultimately responsible for keeping Netflix “up”, effectively moving more bytes across the internet than any other team on the planet (not to mention across a massive amount of AWS instances).

Chaos Engineering Can’t Be Limited to Netflix

We were full of questions, but one came front and center: Why was Netflix the only company practicing chaos engineering at scale? It seemed counterintuitive to us considering it was solving an obvious pain point that every company endures, regardless of size.

Every company values reliability, software and services should be available and dependable. When systems go down, for whatever reason, business is impacted and it often results in a fire drill of epic proportions -> customer complaints, SRE beepers buzzing, post-mortem incident reports and analyses, and band-aid fixes until the next incident comes around. And this is all with the backdrop that the complexity of those very systems continues to increase. The reality is incidents are costly and pervasive.

Yet it seemed from afar that only Netflix was taking a truly proactive approach to try and experiment with their software and systems – in production – to test hypotheses of how those systems would react to real world failures. Why were most companies sticking to reactive tactics (i.e., detection, remediation, disaster recovery), when a proactive approach was needed to identify vulnerabilities in increasingly complex systems before they occur?

A Bit of Schooling from the Experts

After we were introduced to Casey, he brought along his good friend and co-founder at Verica, Aaron Rinehart. Aaron was previously the Chief Security Architect at UnitedHealth Group and led the DevOps and open-source transformation within the company as well. Taking this multi-disciplinary expertise, Aaron also pioneered the development of security chaos engineering. Not only does every company value reliable services that are always available, but they need those services to be secure as well!

As we met with Casey and Aaron, it was clear they were at the forefront of pushing the industry forward in these disciplines. Rather than simply speaking about chaos engineering, they quickly focused our minds on the concept of Continuous Verification, an experimentation platform that allows teams to proactively and safely discover system weaknesses (whether availability or security) before they disrupt business outcomes. Given we spent a lot of time in DevOps at Intel Capital, they really summarized this nascent, but highly-compelling discipline to us by tying into the natural evolution of the CI/CD process:

Source:  Verica

It’s Not Just Breaking Things in Production

When asked why more companies are not doing this, Casey and Aaron provided a compelling narrative of the challenges for enterprises in tooling and instrumenting principles of chaos engineering and continuous verification in a safe way. It’s hard! But what struck us was not merely the technical difficulties of a DIY approach, but the organizational headwinds that prevent a lot of companies from undertaking this effort. Some are driven by fear or misunderstanding (“I don’t want to break things in production”). Some are time (“I don’t have time to break things in production”).

Most importantly, however, Casey and Aaron made it very clear it’s not about just breaking things in production. You don’t go kick your TV as hard as you can over and over to see if it would break – beyond the stupidity of that, the reality is you would learn nothing useful (well, aside from how hard you can kick a TV). Casey and Aaron explained the entire point of chaos engineering and continuous verification is not to break things – it is to understand things. By using empirical experimentation to proactively discover vulnerabilities in software and systems, teams can better understand how their systems react to real world conditions. SRE, DevOps and DevSecOps teams can start with a hypothesis of how their system should act, test that hypothesis, and modify their system based upon the results.

Beyond the philosophy of what continuous verification is intended to do, Casey and Aaron also deeply understand the features that enterprises need to safely deploy and operationalize such a platform. The Verica Continuous Verification Platform is an enterprise-first platform with the right abstractions and useability to help companies instrument such principles in a safe and controlled way.

Our Continued Investment in Continuous Verification

Since first investing in Verica’s seed round and subsequently leading the Series A round, we’ve been thrilled to watch the progress the Verica team has taken with their vision.  Large enterprise customers are relying on Verica’s platform to empower their teams to gain confidence in their software and systems. The platform has an enterprise style deployment, with Kubernetes and Kafka integrations. The feature roadmap is compelling with exciting announcements in the near term.

Beyond the commercial progress, the team has continued to push the industry forward, including Casey working with Nora Jones on an updated O'Reilly Chaos Engineering book, Aaron working with Kelly Shortridge on the first O'Reilly Security Chaos Engineering book, and Courtney Nash, James Wickett and the broader Verica team on launching Verica Open Incident Database, a community-contributed collection of software-related incident reports to facilitate discussion and learnings from past failures. Lastly, the Chaos Community Broadcast of course!

We’re thrilled to take the covers off our work with the Verica team and excited for things to come!