Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

add/replace map result into existing hash

by fredo2906 (Acolyte)
on Feb 19, 2014 at 11:14 UTC ( [id://1075438]=perlquestion: print w/replies, xml ) Need Help??

fredo2906 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a list of files that contains multiple lines. I get those lines into an array, then I would like to push those into a hash with map for later quick search.
for(glob("*.gz")){ my @o = `zcat $_ | sed 's/[<> ]//g'`;chomp @o;push @l,@o; } my %h = map { $_, 1 } @l;
I am trying to remove "my %h = map { $_, 1 } @l;" and "push @l,@o" to use less memory and maybe speed up a bit the process. Any good idea?

====== Update

zcat file1.gz will return :
- line1 xxxxx - line2 yyyyy - line3 zzzzz
the array @l is containing for each turn of the loop :
- file1_line1 xxxxx - file1_line2 yyyyy - file1_line3 zzzzz - filen_line1 xxxxxxx - filen_line2 yyyyyyy - filen_line3 zzzzzzz
then the hash %h is containing
xxxxx -> 1 yyyyy -> 1 zzzzz -> 1 xxxxxxx -> 1 yyyyyyy -> 1 zzzzzzz -> 1
The amount of keys are counted in millions. So, using a hash is much better than using a grep in array to find if a key exist or not later on. Every little bits count, so even if i didnt profiled the code I did both tries with hash and grep and to accomplish the whole treatment with a grep it takes about 15min and with a hash it takes about 1min.

Replies are listed 'Best First'.
Re: add/replace map result into existing hash
by DrHyde (Prior) on Feb 19, 2014 at 11:41 UTC

    Your for loop is effectively a map:

    @l = map { ... } glob("*.gz")

    Does that help?

    BTW, the `shell stuff` will give you a scalar with embedded newlines, not a list of lines of text which is what I presume you want. You'll be better of using open(my $fh, '-|', "zcat $_") instead and read a line at a time, and translate the little sed snippet into perl.

    Finally, why do you think that rewriting the code will save memory or make it faster? And do you know that it actually needs to be made faster? Have you profiled your code?

Re: add/replace map result into existing hash
by hdb (Monsignor) on Feb 19, 2014 at 11:36 UTC

    It is not quite clear how you want your final structure to look like. Here is something untested:

    my %h = map { $_ => [ `zcat $_ | sed 's/[<> ]//g'` ] } glob("*.gz"); chomp @$_ for values %h;
      Thanks, it is actually something like that I was looking for. I updated the post so you can see the kind of structure i am using.
Re: add/replace map result into existing hash
by kcott (Archbishop) on Feb 19, 2014 at 15:27 UTC

    G'day fredo2906,

    I created some test input, before seeing your updated OP, as follows:

    $ cat > pm_1075438_1.txt qw<er ty> a>sd <fgh $ gzip pm_1075438_1.txt $ cat > pm_1075438_2.txt <zxc vbn> 123> <456 $ gzip pm_1075438_2.txt

    This script removes the need for the intermediary @o and @l:

    #!/usr/bin/env perl use strict; use warnings; my %h; ++$h{$_} for map { chomp; $_ } `zcat @{[glob '*.gz']} | sed 's/[<> ]// +g'`; use Data::Dump; dd \%h;

    Output:

    { 123456 => 1, asdfgh => 1, qwerty => 1, zxcvbn => 1 }

    FWIW, this line:

    ++$h{$_} for map { chomp; s/[<> ]//g; $_ } `zcat @{[glob '*.gz']}`;

    produces identical output. I'll leave you to benchmark (if you want).

    -- Ken

      Thank you. Exactly what i needed.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1075438]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2024-03-28 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found