Google引爆大数据时代的三篇论文
1、GFS论文—2003年发表
2003年,Google发布Google File System论文,这是一个可扩展的分布式文件系统,用于大型的、分布式的、对大量数据进行访问的应用。它运行于廉价的普通硬件上,提供容错功能。从根本上说:文件被分割成很多块,使用冗余的方式储存于商用机器集群上。
2、MapReduce论文—2004发表
紧随其后的就是2004年公布的 MapReduce论文,论文描述了大数据的分布式计算方式,主要思想是将任务分解然后在多台处理能力较弱的计算节点中同时处理,然后将结果合并从而完成大数据处理。
3、BigTable论文—2006年发表
Bigtable发布于2006年,启发了无数的NoSQL数据库,比如:Cassandra、HBase等等。Cassandra架构中有一半是模仿Bigtable,包括了数据模型、SSTables以及提前写日志(另一半是模仿Amazon的Dynamo数据库,使用点对点集群模式)。
Three papers published by Google that ignited the era of big data
1. GFS paper - published in 2003
In 2003, Google released the Google File System paper, which is a scalable distributed file system for large, distributed applications that access large amounts of data. It runs on cheap, ordinary hardware and provides fault tolerance. Basically: files are divided into many blocks and stored in a redundant manner on a cluster of commercial machines.
2. MapReduce paper - published in 2004
Following this was the MapReduce paper published in 2004, which described a distributed computing method for big data. The main idea is to decompose the task and process it simultaneously on multiple computing nodes with weaker processing power, and then merge the results to complete the big data processing.
3. BigTable Paper - Published in 2006
Bigtable was released in 2006 and inspired countless NoSQL databases, such as Cassandra, HBase, etc. Half of Cassandra's architecture imitates Bigtable, including data model, SSTables, and write-ahead logs (the other half imitates Amazon's Dynamo database, using a peer-to-peer cluster model).