Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Having trouble loading a hash with map

by Anonymous Monk
on May 03, 2012 at 14:06 UTC ( [id://968746]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

My input is a text file that I created by copying the contents of a PDF and pasting to notepad and saving (which dropped unicode). I then FTPd the text file to my Linux box where I am running the code below. I am simply trying to read each line of the text file, split each line into words and load each word into a hash key with a value of '1' attached to it. Running the map version only loads my hash with LOW, RES and PDF while the for loop version seems to work fine. Why? Does it have something to do with the <> in the map?

The input looks like this, for example...

LOW-RES PDF NOT PRINT-READY MY BIG TOE BOOK 1: A WAKENING Section 1 Delusion or Knowledge: Is This Guy Nuts, or What? Section 2 Mysticism Demystified The Foundations of Reality LOW-RES PDF NOT PRINT-READY The My Big TOE reality model will help you understand your life, your purpose,

This seems to work:

#!/usr/bin/perl -w use strict; use Data::Dumper; my %hash; for my $line ( <> ) { for my $word ( split /(\s+|\W+)/, $line ) { chomp $word; $hash{$word} = '1'; } } print Dumper(%hash); dbmathis@bamboo [~/mbt_index]# cat new_ib1.txt | head -100 | ./validat +ion_poc.pl | head -20 $VAR1 = ''; $VAR2 = '1'; $VAR3 = 'you'; $VAR4 = '1'; $VAR5 = ' '; $VAR6 = '1'; $VAR7 = 'put'; $VAR8 = '1'; $VAR9 = 'dust'; $VAR10 = '1'; $VAR11 = 'my'; $VAR12 = '1'; $VAR13 = 'delivered'; $VAR14 = '1'; $VAR15 = 'business'; $VAR16 = '1'; $VAR17 = 'power'; $VAR18 = '1'; $VAR19 = 'Seeburg'; $VAR20 = '1';

But this doesn't:

#!/usr/bin/perl -w use strict; use Data::Dumper; my %hash; %hash = map { chomp; $_ => '1' } split /(\s+|\W+)/, <>; print Dumper(%hash); dbmathis@bamboo [~/mbt_index]# cat new_ib1.txt | head -100 | ./validat +ion_poc.pl | head -20 $VAR1 = ''; $VAR2 = '1'; $VAR3 = '-'; $VAR4 = '1'; $VAR5 = 'LOW'; $VAR6 = '1'; $VAR7 = ' '; $VAR8 = '1'; $VAR9 = 'PDF'; $VAR10 = '1'; $VAR11 = 'RES'; $VAR12 = '1';

Replies are listed 'Best First'.
Re: Having trouble loading a hash with map
by kennethk (Abbot) on May 03, 2012 at 14:31 UTC

    The issue is scalar vs. list Context. Since the second argument to split is expected to be a scalar, only one line is read and then treated. You can get around this by moving to a slurp (local $/;, see $/).

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Having trouble loading a hash with map
by moritz (Cardinal) on May 03, 2012 at 14:33 UTC
    The output from Data::Dumper becomes much more readable if you feed it with a reference to a hash:
    print Dumper \%hash;

    Also I suspect that you'll like your results better if you simply split /\W+/. All whitespaces are also non-word characters, and by not capturing the splitter you don't get all those whitespace sequences in your hash.

    As for your question, split evaluates its second argument in scalar context, so you are only looking at the first line of input.

Re: Having trouble loading a hash with map
by thundergnat (Deacon) on May 03, 2012 at 14:34 UTC

    It is because you are only applying the map to the first line of the file. Try:

    use strict; use Data::Dumper; my %hash; %hash = map {map { chomp; $_ => '1' } split /(\s+|\W+)/} <DATA>; print Dumper(\%hash); __DATA__ LOW-RES PDF NOT PRINT-READY MY BIG TOE BOOK 1: A WAKENING Section 1 Delusion or Knowledge: Is This Guy Nuts, or What? Section 2 Mysticism Demystified The Foundations of Reality LOW-RES PDF NOT PRINT-READY The My Big TOE reality model will help you understand your life, your purpose,
Re: Having trouble loading a hash with map
by Utilitarian (Vicar) on May 03, 2012 at 14:38 UTC
    #!/usr/bin/perl -w use strict; use Data::Dumper; { local $/; # unset the input record seperator in order to slurp the + contents of STDIN my %hash; %hash = map { chomp; $_ => '1' } split /(\s+|\W+)/, <>; } print Dumper(\%hash); # because that's how it's supposed to be used ;)
    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

      If the File::Slurp module is not available, I find the
          do { local $/;  <> }
      expression preferable due to narrower, thus better controlled, scoping. (In original reply, lexical  my %hash; was defined within, accessed from outside of a block.)

      >perl -wMstrict -le "use Data::Dumper; ;; my %hash = map { chomp; $_ => '1' } split /(\s+|\W+)/, do { local $/; <> }; ;; print Dumper \%hash; " new_ib1.txt | head -10 $VAR1 = { '' => '1', 'BIG' => '1', 'you' => '1', 'model' => '1', 'NOT' => '1', ',' => '1', 'understand' => '1', 'MY' => '1', '2' => '1',
Re: Having trouble loading a hash with map
by jwkrahn (Abbot) on May 03, 2012 at 16:36 UTC
    my %hash; %hash = map { chomp; $_ => '1' } split /(\s+|\W+)/, <>;

    This should do what you want:

    my %hash = map /(\w+)/ ? ( $1, 1 ) : (), <>;
      This should do what you want:

      my %hash = map /(\w+)/ ? ( $1, 1 ) : (), <>;

      Maybe it should, but it doesn't: only maps first word from each line.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://968746]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-23 10:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found