Realtime Hadoop usage at Facebook: The Complete Story
I had earlier blogged about why Facebook is
starting to use Apache Hadoop technologies to serve realtime workloads. We
presented the paper at the SIGMOD 2011 conference and it was very
well received.
Here is a link to
the complete paper for those who are interested in understanding the details of
why we decided to use Hadoop technologies, the workloads that we have on
realtime Hadoop, the enhancements that we did to Hadoop for supporting our
workloads and the processes and methodologies we have adopted to deploy these
workloads successfully. A shortened version of the first two sections of the
paper are also described in the slides that you can find here.
Realtime Hadoop usage at Facebook -- Part 2 - Workload Types
This is the second part of our SIGMOD-2011 paper that
describes our use case for Apache Hadoop and Apache HBase in
realtime workloads. We describe why Hadoop and HBase fits the requirements of
each of these applications.
OUR WORKLOADS
OUR WORKLOADS
Before deciding on a particular software stack and whether
or not to move away from our MySQL-based architecture, we looked at a few
specific applications where existing solutions may be problematic. These use
cases would have workloads that are challenging to scale because of very high
write throughput, massive datasets, unpredictable growth, or other patterns
that may be difficult or suboptimal in a sharded RDBMS environment.
1. Facebook Messaging
The latest generation of Facebook Messaging combines existing Facebook messages with e-mail, chat, and SMS. In addition to persisting all of these messages, a new threading model also requires messages to be stored for each participating user. As part of the application server requirements, each user will be sticky to a single data center.
1.1 High Write Throughput
With an existing rate of millions of messages and billions
of instant messages every day, the volume of ingested data would be very large
from day one and only continue to grow. The denormalized requirement would
further increase the number of writes to the system as each message could be
written several times.
1.2 Large Tables
As part of the product requirements, messages would not be deleted unless explicitly done so by the user, so each mailbox would grow indefinitely. As is typical with most messaging applications, messages are read only a handful of times when they are recent, and then are rarely looked at again. As such, a vast majority would not be read from the database but must be available at all times and with low latency, so archiving would be difficult. Storing all of a user’’s thousands of messages meant that we’’d have a database schema that was indexed by the user with an ever-growing list of threads and messages. With this type of random write workload, write performance will typically degrade in a system like MySQL as the number of rows in the table increases. The sheer number of new messages would also mean a heavy write workload, which could translate to a high number of random IO operations in this type of system.
1.3 Data Migration
One of the most challenging aspects of the new Messaging product was the new data model. This meant that all existing user’’s messages needed to be manipulated and joined for the new threading paradigm and then migrated to the new system. The ability to perform large scans, random access, and fast bulk imports would help to reduce the time spent migrating users to the new system.
2 Facebook Insights
Facebook Insights provides developers and website owners with access to real-time analytics related to Facebook activity across websites with social plugins, Facebook Pages, and Facebook Ads. Using anonymized data, Facebook surfaces activity such as impressions, click through rates and website visits. These analytics can help everyone from businesses to bloggers gain insights into how people are interacting with their content so they can optimize their services. Domain and URL analytics were previously generated in a periodic, offline fashion through our Hadoop and Hive analytics data warehouse. However, this does not yield a rich user experience as the data is only available several hours after it has occurred.
2.1 Realtime Analytics
The insights teams wanted to make statistics available to their users within seconds of user actions rather than the hours previously supported. This would require a large-scale, asynchronous queuing system for user actions as well as systems to process, aggregate, and persist these events. All of these systems need to be fault-tolerant and support more than a million events per second.
2.2 High Throughput Increments
To support the existing insights functionality, time and demographic-based aggregations would be necessary. However, these aggregations must be kept up-to-date and thus processed on the fly, one event at a time, through numeric counters. With millions of unique aggregates and billions of events, this meant a very large number of counters with an even larger number of operations against them.
3. Facebook Metrics System
At Facebook, all hardware and software feed statistics into a metrics collection system called ODS (Operations Data Store). For example, we may collect the amount of CPU usage on a given server or tier of servers, or we may track the number of write operations to an HBase cluster. For each node or group of nodes we track hundreds or thousands of different metrics, and engineers will ask to plot them over time at various granularities. While this application has hefty requirements for write throughput, some of the bigger pain points with the existing MySQL-based system are around the resharding of data and the ability to do table scans for analysis and time roll-ups. This use-case is gearing up to be in production very shortly.
3.1 Automatic Sharding
The massive number of indexed and time-series writes and the
unpredictable growth patterns are difficult to reconcile on a sharded MySQL
setup. For example, a given product may only collect ten metrics over a long
period of time, but following a large rollout or product launch, the same
product may produce thousands of metrics. With the existing system, a single
MySQL server may suddenly be handling much more load than it can handle,
forcing the team to manually re-shard data from this server onto multiple servers.
3.2 Fast Reads of Recent Data and Table Scans
A vast majority of reads to the metrics system is for very recent, raw data, however all historical data must also be available. Recently written data should be available quickly, but the entire dataset will also be periodically scanned in order to perform time- based rollups.
(Credit to the authors of the paper: Dhruba Borthakur
Kannan Muthukkaruppan Karthik Ranganathan Samuel Rash Joydeep Sen Sarma
Jonathan Gray Nicolas Spiegelberg Hairong Kuang Dmytro Molkov Aravind Menon
Rodrigo Schmidt Amitanand Aiyer)
No comments:
Post a Comment