Hi,
I have a list of files that contains multiple lines. I get those lines into an array, then I would like to push those into a hash with map for later quick search.
for(glob("*.gz")){
my @o = `zcat $_ | sed 's/[<> ]//g'`;chomp @o;push @l,@o;
}
my %h = map { $_, 1 } @l;
I am trying to remove "my %h = map { $_, 1 } @l;" and "push @l,@o" to use less memory and maybe speed up a bit the process.
Any good idea?
======
Update
zcat file1.gz will return :
- line1 xxxxx
- line2 yyyyy
- line3 zzzzz
the array @l is containing for each turn of the loop :
- file1_line1 xxxxx
- file1_line2 yyyyy
- file1_line3 zzzzz
- filen_line1 xxxxxxx
- filen_line2 yyyyyyy
- filen_line3 zzzzzzz
then the hash %h is containing
xxxxx -> 1
yyyyy -> 1
zzzzz -> 1
xxxxxxx -> 1
yyyyyyy -> 1
zzzzzzz -> 1
The amount of keys are counted in millions. So, using a hash is much better than using a grep in array to find if a key exist or not later on. Every little bits count, so even if i didnt profiled the code I did both tries with hash and grep and to accomplish the whole treatment with a grep it takes about 15min and with a hash it takes about 1min.