I particularly liked the level of detail Dare provided in his write-up. He discusses the evolution of the Facebook architecture from a sharded-by-school approach to today’s much more demanding requirements.
Dare’s description of how the FB News feed is assembled via their “Multifeed” service is incredible:
Multifeed is a custom distributed system which takes the tens of thousands of updates from friends and picks the 45 that are either most relevant or most recent. Bob updates status to DB and then Scribe pushes this to a Multifeed server which is a cache of the most recent 50 items the user has performed. This caches data for every user in the system. When a user shows up, the user hits an aggregator which takes the list of friends and then each leaf node does some filtering then gets back to the aggregator which then does some filtering and returns story IDs. Then the aggregator fills out the data from memcache given the story ID.
Facebook’s use of memcache is well-known but I wasn’t aware of some of the changes they have introduced including:
- Ported to 64-bit
- Migrated protocol to UDP
- Added multi-threading support
Dare notes that these changes have increased the throughput 5x. Hopefully FB will be contributing these changes back to the memcache project.
The money quote for me was
Huge Impact with Small Teams – Most features and underlying systems on the site are built by teams of one to three people.
Great news for FB that they are able to remain so entrepreneurial and innovative even as the company experiences incredible growth. Not an easy task and something I really miss on a personal level.
Facebook is a large-scale, transaction-intensive service. Learning about how they are working to keep up with the demands of a growing service is fascinating. It is also a great way to leverage what they’ve learned and apply it to your world.