How Twitter’s Database Infrastructure Ensures Network Stability

In Twitter's early history, the fail whale was a pretty common occurrence. But these days, there's an intricate structure to keep things humming along.


In Twitter’s early days, there were lots of infrastructure problems and the early adopters became very familiar with the fail whale. But the infrastructure has gotten better over the years, even as user numbers and tweet volume increases. According to Wired contributor Cade Metz, Twitter requires a staggeringly intricate infrastructure to maintain network stability.

There are more than 240 million Twitter accounts from all over the world tweeting 5,700 times every second. These tweets include a multitude of links, hashtags, photos and videos. According to Metz it’s way more complicated than you would think.

“Because it contains so many types of data,” he says, “even a single tweet is spread across multiple machines.” This new system, called Manhattan, is what now enables Twitter to run so smoothly. Since outgrowing the MySQL and Cassandra database systems, Twitter’s engineers have been working on the new system.

Manhattan, which began development two years ago, is designed to be an all-in-one solution. Previously, Twitter had relied on up to three database systems to deliver tweets, verify handles and display the stream.

“Because we’re a real-time company, we really care about availability of our data,” Twitter engineer Chris Goffinet told Metz. “If it’s inconsistent for milliseconds, that’s fine. But we have to be up and online at all times.” And that’s the drive behind Manhattan and the database systems of other large online companies.

As companies take on bigger user counts, they have to choose between speed and consistency. For users, that either means downtime or late tweets, respectively. So custom solutions must be built and Goffinet says that’s exactly what tech companies are moving toward.