Database Optimization on Autopilot: Our Investment in OtterTune
By Assaf Araki and Nick Washburn
The database is a complex beast. To optimally install and configure a database, one needs to understand how to choose the right CPU, memory, storage system, and network. There will be a different software (SW) configuration for every hardware (HW) setting, including the storage files, tables definition, and indexing. The goals of the SW and HW configurations are to answer the needs of the queries that will run on the database management system (DBMS). Once the DBMS goes live, it changes as application workloads change and evolve at runtime – and the configurations need to track the workloads and usage to stay optimized.
DBMS Trends During the Last Decade
The last decade has witnessed a number of clear trends with respect to DBMS adoption:
- Overall, DBMS usage and market size has grown rapidly and continues to do so. For example, Gartner estimated in their annual Market Share reports that the DBMS market was $23.3 billion in 2010 and it reached $65.8 billion in 2020.
- The share of open-source DBMSs has rapidly increased during that period. In their State of the Open-Source DBMS Market, 2019 report, Gartner estimated that by 2022 more than 70% of new in-house applications would be developed on an open-source DBMS or an open-source database-as-a-service (DBaaS) offering. They further estimated that 50% of existing proprietary relational DBMS instances will have been converted or be in the process of converting by that time. DB-Engines ranks DBMS using a range of data sources, and as of May 2022, open-source DBMSs surpassed commercial licensed DBMSs, 51.4% to 48.6%.
- Relational DBMSs continue to represent the lion’s share of the most popular databases. Gartner estimated in their Market Share report that relational DBMSs captured 83% of the DBMS market in 2020. Further illustrating the point, as of May 2022 relational DBMSs continue to represent the most popular category at 72% on DB-Engines rankings.
- Cloud relational DBMSs have grown the fastest of them all. In October 2009, Amazon Web Services (AWS) released Relational Database Service (RDS), supporting MySQL and later expanding to a variety of other relational databases, including PostgreSQL, SQL Server, and Oracle. A study from Stack Overflow revealed that traffic to discussion pages about RDS increased 40% year-over-year from 2017 to 2018. Further, Valuates Reports estimates that the cloud DBMS market segment will reach $68.7 billion by 2026, at a CAGR of 38.2%.
Stepping back, the past decade clearly saw incredible growth in usage and adoption of relational DBMSs – particularly cloud DBMSs based on open-source projects. On one hand, this growth largely made DBMSs more accessible to non-database professionals. Cloud DBaaS offerings from the major cloud-service providers (CSPs) carried high-levels of abstraction. As a result, SW developers and data analysts were able to deploy relational databases with ease.
On the other hand, underneath the hood something else was also increasing: complexity. During this period, the number of system knobs available for configuration and tuning increased dramatically. By some estimates, configuration knobs on relational DBMSs have grown 5-7X in the last twenty years.
So we were left with this interesting juxtaposition – more users were adopting and deploying relational databases with ease, while the underlying operational complexity skyrocketed. And worse, the “experts” at running DBMSs at scale – database administrators (DBAs) – were not growing in parallel. According to the 2020 Data Management Staffing Ratios report, IT staff grew only by 1% in the last two years. A typical enterprise organization has 100s to 1000s of DBMS instances. Yet, human DBAs were minimal and could only monitor and operationally support a fraction of them.
Lastly, during the past decade the “Ops” businesses have grown rapidly, from ITOps solutions that monitor SW and HW, to FinOps solutions that focus on cost monitoring, to MLOps that manage ML models. Nevertheless, most database optimization and tuning services still were handled by consulting companies who manually reviewed the state of the DBMS occasionally and then manually tuned it. If a company had enough optimization or resiliency needs, they may then have hired a full-time DBA. However, even the best DBA cannot go over the endless configuration options to optimize a DBMS and they certainly are not available for the job 24x7.
A New Approach Is Needed & Born Out of CMU Research
OtterTune was founded in 2020 by Andy Pavlo, Dana Van Aken and Bohan Zhang, but the idea of the concept originated many years prior stemming from academic interests of Andy. During Andy’s original Ph.D., he worked on automated methods to speed up transaction processing databases. When Andy then joined Carnegie Mellon University (CMU) as a professor in 2013, he continued to think about further automated methods for improving databases.
The “ah-ha” moment for Andy stemmed right after joining CMU in fact. During a visit to a large financial institution to explore industry problems around database operations, Andy’s eyes were opened to the amount of money the company was spending on what were – effectively – very basic maintenance tasks. This meeting spurred Andy to go deeper into his research on autonomous databases.
The timing was very critical, as in parallel there was a resurgence in machine learning (ML) studies at that time on account of readily available datasets, effectively “infinite” compute in the cloud, increased HW acceleration in that compute for ML workflows, and new open-source ML frameworks. Accordingly, Andy was spurred by the recognition of a clear industry problem and a tipping point in available ML technologies. He wanted to dive deep into applying ML towards autonomous database optimizations.
Together with Dana and Bohan, who were then students of Andy at CMU, Andy started the OtterTune research project at CMU in 2014. Their original thesis was to use ML to autonomously optimize DBMS knob configurations. As mentioned above, this was during the period that there was an explosion in knob complexity within DBMS – leading to a very difficult area for DBAs to configure in practice. Ultimately, Dana’s Ph.D. dissertation was the result of this research at CMU, with Dana leading the design and development of the original prototype.
Andy, Dana and Bohan published their work in SIGMOD in 2017, and it was well received by the academic community. Fortunately, the original SIGMOD paper had been funded by a research grant from Amazon, and the Amazon team invited Andy and Dana to write a blog article about the product. The blog resulted in Dana being inundated with emails from people asking her to run OtterTune to optimize their databases.
Sensing this very strong market signal – when coupled with all of the notable trends the past decade outlined above – Andy, Dana and Bohan (who had since graduated at CMU) decided to incorporate OtterTune. From our perspective, Andy, Dana and Bohan are the perfect team to tackle this problem, as it is highly technical and multi-disciplinary at the intersection of database field and applied ML.
OtterTune Brings a Safety Net to Let Users Focus on Data and Insight – Rather than Worrying About the Database
Since incorporating and launching OtterTune in 2020, the team has released a highly accessible optimization platform that has expanded well beyond knob tuning, which is only one of many aspects of the database lifecycle. Supporting databases on Amazon RDS and Aurora, OtterTune’s platform automatically ensures that these databases run with the proper configuration settings, query plans, indexes, and replication schemes. Beyond just performance improvements and cost reductions, OtterTune focuses on OLTP DBMSs that are the backbone of any company operation, critical for the business, and provides a safety net for running cloud databases – freeing up developer and DBA time to focus on more impactful tasks.
Similarly, launching the OtterTune team has further expanded the product to cover health checks that identify specific database problems – automatically finding the proverbial “needle in the haystack.” Many developers understand that something in a PostgreSQL or MySQL database is wrong, but they have no idea how to find the cause given the complexity (and time limitations from the myriad of other things they have to focus on). The health checks can proactively identify problems such as missing indexes or cache misses, among others. Table-level health checks – like data bloat – provide even more fine-grained, automatic checks to help users avoid resiliency and performance issues. Identifying these problems through health checks is just the first step – OtterTune is working on additional features that will enable one-click remediation of surfaced issues.
As cloud database adoption continues to skyrocket, we believe OtterTune will be the key optimization platform upon which databases run efficiently, cost-effective and with high degrees of performance and resilience. Automatically delivering this as a platform requires a deep understanding of how applications use databases and database internals – which Andy, Dana, and Bohan have in spades. We’re thrilled to co-lead OtterTune’s Series A alongside our friends at Race Capital, together with Accel participating.