🚀 go-pugleaf

RetroBBS NetNews Server

Inspired by RockSolid Light RIP Retro Guy

Thread View: gwene.org.apache.planet
1 messages
1 total messages Started by Edward J. Yoon Sun, 26 Aug 2012 13:31
MapReduce and Beyond
#3980
Author: Edward J. Yoon
Date: Sun, 26 Aug 2012 13:31
16 lines
2466 bytes
<div xmlns="http://www.w3.org/1999/xhtml">Hi, in this post I'm going to tell you about past and near future of big data processing. In 2006, I worked as a Senior Software Engineer for web portal company, NHN, corporation. Since then, I had experienced a data explosion (the average pageview per day was one billion), and began to research distributed computing technologies.
<br/><br/>
In my early research, batch-oriented MapReduce<sup>[1]</sup> was one of interesting technology. As all of you know well now, MapReduce programming is very simple and powerful, especially, useful for the aggregation and several basic relational algebraic operations on large data-sets.  <br/><br/>
However, MapReduce is <strong>NOT good for everything</strong>. For example, graph algorithms<sup>[2]</sup>, machine learning, and matrix arithmetic. SQL-like Pig, Hive, and MR-based Mahout shows well the scope and limit of MapReduce. Iterative MapReduce also has some problems such as heavy cost for task assignment and I/O overhead. A lack of ability to perform as a real-time was also issue.
<br/><br/>
Today, many MapReduce alternatives are now available to solve efficiently such problems:




  <ul>
    <li>Apache Hama<sup>[3]</sup> - BSP (Bulk Synchronous Parallel) computing engine on top of Hadoop</li>
    <li>Apache Giraph - BSP (Bulk Synchronous Parallel) based graph computing framework</li>
    <li>Apache S4 and Twitter Storm - Scalable real-time processing system</li>
  </ul>Wow! too many to learn, but please don't worry. Hadoop 2.0, YARN<sup>[4]</sup> will manages these new alternatives at once. <br/><br/>1. <a href="http://research.google.com/archive/mapreduce-osdi04.pdf">MapReduce: Simplified Data Processing on Large Clusters</a> <br/>2. <a href="http://dl.acm.org/citation.cfm?id07184">Pregel: a system for large-scale graph processing</a><br/>3. <a href="http://hama.apache.org/">Apache Hama: Bulk Synchronous Parallel Computing Framework</a><br/>4. <a href="http://www.infoq.com/articles/ApacheYARN">Interview with Arun Murthy on Apache YARN</a><div class="blogger-post-footer"><img alt="" height="1" src="https://blogger.googleusercontent.com/tracker/9588112-274572857142856098?l=blog.udanax.org" width="1"/></div><img height="1" src="http://feeds.feedburner.com/~r/EdwardJYoonsBlog/~4/ONm2OtSuUtI" width="1"/></div>

<p><a href="http://feedproxy.google.com/~r/EdwardJYoonsBlog/~3/ONm2OtSuUtI/mapreduce-and-beyond.html">Link</a>
Thread Navigation

This is a paginated view of messages in the thread with full content displayed inline.

Messages are displayed in chronological order, with the original post highlighted in green.

Use pagination controls to navigate through all messages in large threads.

Back to All Threads