Topics:
Twitter turns to Protocol Buffers for back-end data storage
Top micro blogging site Twitter has eschewed popular technologies such as XML, CSV and JSON for its back-end data storage needs. For the average of 12TB data that it stores each day, Twitter is instead relying on a data format from Google (NASDAQ: GOOG) called Protocol Buffers.
Twitter analytics lead Kevin Well told Computerworld that it is planning the infrastructure to store "a trillion tweets" and requires the correct data format and tools in order analyze such a vast trove of information. The use of Protocol Buffers together with other associated technologies should help streamline this task, says Well.
The problem with something like XML and JSON is that both protocols are rather wordy by design. While far more efficient, CSV is not ideal in situations where the schema is changed down the road, and requires considerable reprogramming, should that happen. Protocol Buffer is also widely used within Google, and is an extensible protocol that serializes data. In addition, the process of recreating the data structures that might be required for use by different applications is also automated.
For more on this story:
- check out this article at PC World
Related Articles:
CIOs say Twitter helps them recruit, crowdsource and combat group think
Twitter jumps on the button bandwagon
Bing and Google announce Twitter Search
Stolen Twitter accounts going for $1,000




Comments