One of the things the people at Google noticed is, that a lot of their problems amount to process a lot of input data to compute some derived data from that input data. For example, they process the ca. 4 billion web pages and compute an index from them. These input data are very diverse: document records, log files, on-disk data structures, etc. and require lots of CPU time. They have the infrastructure to deal with it, but they wanted a framework for automatic en efficient distribution and parallelization of the jobs across their clusters. Even better if it provided fault-tolerance and scheduled I/O. Status monitoring would also be nice.
So over the last year, they devised MapReduce
- inspired by the map
primitives present in the functional language Lisp. Most of their operations involved applying a map operation
to each logical "record" in their input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately.
More information can be read here
, or be seen in this video
(some 23 minutes into the presentation).
I don't know in which programming language MapReduce was written, but could you write the basic functions in Perl?
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.