Tuesday, November 10, 2009

Hadoop crunches Web-sized data -- Cloud software -- InformationWeek

As the World Wide Web has exploded into millions of sites and billions of documents, the search engines that purport to know about everything on the Web have faced a gargantuan task. Sure, more spiders can be activated to crawl the Web and collect information. But what system can analyze all the data before the information is out of date?

The answer is a cluster-based analysis system, sometimes referred to loosely as a cloud database system. At the Cloud Computing Conference and Expo Nov. 3 in Santa Clara, Calif., representatives ofYahoo (NSDQ: YHOO) explained how they use Hadoop open source software, from the Apache Software Foundation, to analyze the Web...