Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: Hadoop and perl

by spazm (Monk)
on Sep 09, 2010 at 17:05 UTC ( #859535=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Hadoop and perl
in thread Hadoop and perl

I've added Hadoop::Streaming to cpan. This module aims to simplify the Streaming interface to Hadoop, by automating the input/output mapping. I found the initial module on github, then extended it with new features and tests and pushed it to CPAN.

Please take a look and let me know how to make it more useful, especially to the new-to-hadoop and new-to-hadoop-streaming user! Specifically I've added new documentation to the root level Hadoop::Streaming, and I'd love feedback.

Thank you,
--spazm

package My::Hadoop::Example::Wordcount; use Moose::Role; sub map { my ($self,$line) = @_; my @words = split( /\W+/, $line); $self->emit( $_ => 1 ) for @words; } sub reduce { my ( $self, $key, $value_iterator) = @_; my $sum = 0; while( $value_iterator->has_next() ) { my $value = $value_iterator->next(); $sum += $value; } $self->emit( $key, $sum ); } sub combine { my ( $self, $key, $value_iterator) = @_; my $sum = 0; while( $value_iterator->has_next() ) { my $value = $value_iterator->next(); $sum += $value; } $self->emit( $key, $sum ); } package My::Hadoop::Example::Wordcount::Mapper; use Moose; with Hadoop::Streaming::Mapper, My::Hadoop::Example::Wordcount; package My::Hadoop::Example::Combiner::Wordcount::Mapper; use Moose; with Hadoop::Streaming::Combiner, My::Hadoop::Example::Wordcount; package My::Hadoop::Example::Wordcount::Reducer; use Moose; with Hadoop::Streaming::Reducer, My::Hadoop::Example::Wordcount; 1;
Driver files:

my_mapper:

#!/usr/bin/perl use My::Hadoop::Example; My::Hadoop::Example::Mapper->run();
my_combiner:
#!/usr/bin/perl use My::Hadoop::Example; My::Hadoop::Example::Combiner->run();
my_reducer:
#!/usr/bin/perl use My::Hadoop::Example; My::Hadoop::Example::Reducer->run();
Hadoop streaming jar command:
hadoop \ jar $streaming_jar_name \ -D mapred.job.name="my hadoop example" \ -input my_input_file \ -output my_output_hdfs_path \ -mapper my_mapper \ -combiner my_combiner \ -reducer my_reducer


Comment on Re^3: Hadoop and perl
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://859535]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2015-07-03 23:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls