Monday, August 17, 2009

How Yahoo, Facebook, Amazon & Google think about big data

Collectively, Yahoo, Facebook, Amazon and Google are rewriting the handbook for big data. Startups intending to reach these proportions must also change their thinking about data, and enterprises need this model for internal deployments as a way to retain an economic edge.The four leading web giants have designed systems from scratch, evidence that workloads have altered, business models are different, and economies have changed — all demanding a new approach.

Yahoo revealed a few weeks ago how it approaches unstructured data on an Internet scale with MObStor, the technology that “grew out of Yahoo Photos” but now serves the unstructured storage needs across the company. Earlier this year, Facebook unveiled Haystack, its solution to managing its growing photo collection (which could reach 100 billion photos in 2009 if it continues with current growth rates). In 2007, Amazon outlined Dynamo, an “incrementally scalable, highly available key-value storage system.” All of these were predated by The Google File System, presented as a research paper in October 2003...