How is Facebook able to quickly process the sort of queries about users and their friends generated by features such as Graph Search, despite the fact that the relevant data may be stored on several different servers? Software Engineers Alessandro Presta and Alon Shalita offered an example of how the social network uses graph-processing system Apache Giraph to handle those tasks in a post on its engineering blog.
Describing the issues Facebook’s engineering team faces when it comes to processing those queries, they wrote:
Facebook’s architecture relies on various services that answer queries about people and their friends. Because of the size of the data set, number of queries per second, and latency requirements, many of these systems cannot run on a single machine. Instead, people and their metadata are sharded across several machines. In such a distributed environment, answering queries might require communication among all these servers.
Imagine that one of our services requires fetching information about all the friends of a specific person. Such a query would be first directed to the shard that contains the person’s data (including the list of friends). Then, for each friend, a query would be issued to the respective shard, asking for the required information. All the responses would then be aggregated to form the final answer.
If people are distributed randomly across shards (for example, by hashing their user IDs), such a query may hit almost all the machines in the system and require a lot of network traffic, which could result in high latency.
We have solved this problem by using graph partitioning, which can be formalized as follows: Given an undirected graph G = (V, E) and a natural number k, we want to partition the vertex set V into k equal-sized subsets, so as to maximize the number of edges that have both endpoints in the same partition (we call those “local edges”).
Software Engineers Brian Karrer, Arun Sharma, and Igor Kabiljo also worked on the project, as did former interns Aaron Adcock and Herald Kllapi.
For the technical details of Facebook’s use of Apache Giraph, please see the blog post by Presta and Shalita.