Cloud led disruption in Software Systems

Paradigm shifts like main frame, client server, cloud have led to disruptive changes in software systems so as to fit into the prevailing architectural patterns. In most cases, the software systems are first retrofitted into the new architecture, and later re-designed to take advantage of the new architecture.

In this post, I discuss how relational databases were earlier retrofitted into the cloud architecture (many customers are still using the databases in this mode) but now face a serious competition from databases designed for the cloud. I leave the question open as to what software systems are getting redesigned for cloud, especially when cloud-native innovations such as server-less are happening at a pace that makes it hard for software systems to keep pace with.

Amazon Aurora is a database designed specifically for the cloud architecture. Amazon Aurora Serverless was added as an on-demand, auto-scaling configuration. Microsoft recently announced Azure SQL Database Hyperscale built for cloud.

So what’s different about cloud that warrants re-architecture of software systems such as relational databases?

In a cloud world, compute is decoupled from storage allowing cloud services to be resilient and scalable. Hosts can fail and be replaced independent of storage. In this environment, for a database workload, network becomes the IO bottleneck. This bottleneck manifests itself in many ways impacting database performance and adding complexities in commit protocols. For instance, multiphase sync protocols such as 2-phase commits are intolerant of failures (common in cloud systems) requiring complex recovery mechanisms.

There are examples of highly reliable, scalable and available systems (such as Google GFS) built on top of components where failure is a norm. Amazon Aurora’s key design idea was to minimize network IO (the new bottleneck in a cloud system where compute is decoupled from storage) between the compute and storage systems by offloading storage related functions of relational databases to the cloud storage layer. In Aurora, query processing, transactions, concurrency, buffer cache, and access methods are decoupled from logging, storage, and recovery that are implemented as a scale out storage service.

This Sigmod paper covers Amazon Aurora design details but a few things to note:

The log is the database. No checkpointing , no pages are ever written from DB to storage. Only thing that crosses the network are redo log record.
Durability is achieved at the storage tier by a 4/6 quorum model across AWS Availability Zones (AZ). Log writes are issued in parallel and the first 4 out of 6 write completions mark the quorum requirement, avoiding outlier performance penalty.
Foreground tasks are negatively correlated with background tasks (unlike traditional DBs). Aurora trades off CPU with disk to prioritize foreground tasks.

The net result is a cloud scale RDBMs that is five times faster than MySQL at 1/10th of the cost. Amazon.com has moved completely away from Oracle to Amazon Aurora and other purpose built databases on the cloud.

This begs the question – Like relational databases, what other major software systems (e.g. SAP) , if any, are getting redesigned for cloud to deliver business benefits (lower cost, improved performance) ? Or are they waiting to be challenged by new systems (from startups or cloud vendors) designed purposely for cloud ?

We live in a world where cloud is the new normal, and pace of innovation is all time high.

Company

Resources

Services