Google’s F1 Brings NoSQL Scale To Relational Databases

MySQL familiarity or NoSQL scalability seems like a binary choice. But Google’s F1 –  the new relational database management system (RDBMS) underpinning several of Google’s customer-facing, business-critical advertising services – lays claim to combining the best of both worlds.

The F1 system is detailed in a paper/presentation entitled “F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business,” co-authored by several Googlers and published earlier this month.

“F1 implements rich relational database features, including a strictly enforced schema, a powerful parallel SQL query engine, general transactions, change tracking and notification, and indexing, and is built on top of a highly distributed storage system that scales on standard hardware in Google data centers,” as the abstract puts it.

This comes at a cost of higher write latencies, when compared to Google’s legacy MySQL deployments. But thanks to F1′s distributed nature, it was apparently relatively simply to deploy it underneath those aforementioned ad services with no downtime. Both the simplicity and the lack of downtime are critical, given the fact that Google’s ad business handles tens of terabytes replicated across thousands of machines over any given 24-hour period, as per the presentation.

The presentation describes the underlying architecture of F1 better than I could, but the general idea is that it was developed alongside Spanner, Google’s new low-level storage system and the descendent of BigTable. In addition to the stateless server and a pool of workers for query execution, F1 consists of sharded Spanner servers, with data stored in Google File System (GFS) and in memory.

F1 uses a relational schema that can run SQL and MapReduce in parallel. The system is replicated across five data centers to assure availability, with those replicas at least 100ms apart in case of regional disaster.

The bottom line is that Google found its own compromise that made internal developers happy even as it enabled greater operational scale. Developers get their SQL queries, but maintain a level of availability and fault-tolerance that MySQL can’t match.

It seems like the best of both worlds. But reading this presentation over, I suspect that if it were this easy, everybody would be doing it. Given the massive growth of the NoSQL ecosystem, my guess is that Google may have hit on an innovative solution for its own use case – but those use cases may be limited.

Of course, Google isn’t the first to hit on this kind of hybridization: Drawn To Scale has a similar, SQL-friendly big data offering, but built on a Hadoop core rather than F1′s GFS base. Rainstor has taken a similar tack. An Oracle/Cloudera partnership also facilitates a more roundabout route to the same by way of a connector between Oracle databases and Cloudera’s distribution of Hadoop.