Speed up hash initialization loop

austinj has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code where I loop through several files (approx 300) and get the corresponding values from them. this builds a large hash where the first key is the profile name, followed by the info you want. Unfortunately this slows down significantly when I run a large number of files. It runs approx 0.05 seconds per profile if I do < 30 files but if I do a large number (300+) it takes up to 1.0 seconds per profile. Is there a way I can initialize the hash or someway speed this up when it is running a large number? Thanks

          foreach my $customProf (@{scalar2array($Settings->{custom_pl
+ots}[$i][$t]{profiles})}){
            if(exists $profEvntVar->{$customProf}){next;}

            my ($evnt_nums_ref,$var_names_ref,$variables_ref,$last_evn
+t_name) = getvars($customProf);
            $profEvntVar->{$customProf}{events} = $evnt_nums_ref;  #Li
+nk profiles with their respective events
            $profEvntVar->{$customProf}{times} = $evnt_times_ref;  #Li
+nk profiles with their respective events
            $profEvntVar->{$customProf}{vars} = $var_names_ref;      #
+Link profiles with their respective variables
            $profEvntVar->{$customProf}{varsref} = $variables_ref;    
+  #Link profiles with their respective variables           
            $profEvntVar->{$customProf}{lastEvent} = $last_evnt_name; 
+   #Link profiles with their respective variables
          }

sub getvars{
my $profile = shift(@_);
my @ptk_info = `<system_command_here>`;
my $evnt_nums_ref;
my $events_found = 0;
my $vars;
my $prof_var_names;
my $last_evnt_name;

foreach (@ptk_info){
   if(/^\s*(\d*\.\d*)0\s-\s(.*?)\s*$/){
      $evnt_nums_ref->{$2} = $1;
      $last_evnt_name = $2;
   }
   if($events_found == 1 && /^\b/){
         push( @{$vars}, split(" ") );
   }
   elsif($events_found == 0 && /^Events$/){
      $events_found = 1;
   }
}
foreach (@{$vars}){
   if(/(.{1,8})\S*:\S*/){
      $prof_var_names->{$1} = $_;
   }
   else{
      $prof_var_names->{$_} = $_;
   }
}

my @sorted_vars = sort { lc($a) cmp lc($b) } @{$vars};     


return $evnt_nums_ref,$prof_var_names,\@sorted_vars,$last_evnt_name;
[download]

Comment on Speed up hash initialization loop Download Code

Replies are listed 'Best First'.
Re: Speed up hash initialization loop by graff (Chancellor) on Jan 30, 2013 at 05:15 UTC
How many iterations of the initial "foreach" loop are you doing (i.e. how many "profiles" are there)? Is it the same set of 300+ files that the "getvars" sub is loading on each iteration, or does each "profile" bring in its own set of distinct files? If it's the same 300 files each time, you might see a big difference if you figure out how to restructure the looping so that you read each file exactly once, and populate all the profiles in that one pass over each file. But I'm only guessing, because you haven't provided enough info about the problem (number of profiles, total amount of data in the files, what manner of "system_command" are you running for each file). Apart from that, anything you do to simplify the "getvars" code will help some; e.g.: - don't use references to hashes and arrays when you don't need to ("prof_var_names" and "evnt_nums_ref" should just be plain hashes; you can return them as refs the same way you do "sorted_vars", and "vars" should just be @vars). - use a "pipe open" to run your system command, read from the pipe until you see `/^Events$/`, then read the data of interest - i.e.: sub getvars { my $profile = shift; my ( @vars, %evnt_nums, %prof_var_names, $last_evnt_name ); open( my $ptk_info, '-\|', "system command here" ) or die "$profile +: $!\n"; while (<$ptk_info>) { last if ( /^Events$/ ); # skip lines till this line is found } while (<$ptk_info>) { my @tkns = split; if ( $tkns[0] =~ /^(\d\.\d)0/ ) { $last_evnt_name = $tkns[2]; $evnt_nums{$last_evnt_name} = $1; } push @vars, @tkns; } ... # (do other for loop, sort @vars return \%event_nums, \%prof_var_names, \@sorted_vars, $last_evnt_n +ame; } [download]	[reply] [d/l] [select]
Re^2: Speed up hash initialization loop by austinj (Acolyte) on Jan 30, 2013 at 15:09 UTC
There are 300 profiles, getvars only sees each of the profiles once (it takes a profile location as an argument) I am checking to make sure I haven't already ran the profile so as to not run it twice. I switched to the pipe open as suggested (no significant change in runtime) I also changed all hash/array refs to standard hashes and returned the refs as suggested (again no significant runtime change) The files themselves are relatively small and the system command returns approx 30 lines of text which I use in the regex Thanks for the help	[reply]
Re^3: Speed up hash initialization loop by Anonymous Monk on Jan 31, 2013 at 20:56 UTC
How large will `@$vars` be? If larger than a hundred elements or so, you will benefit from a Schwartzian transform: `my @sorted_vars = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [ lc $_, $_ ] } @{$vars};` [download] I'm afraid I'm as out of ideas as the other posters here -- your only recourse is to use a profiler and find the bottlenecks that way.	[reply] [d/l] [select]
Re: Speed up hash initialization loop by Anonymous Monk on Jan 29, 2013 at 20:27 UTC
How much time is spent building those hashes, and how much time is spent on disk IO? I also see a lot of stars in your regexes (at least they're not deathstars `.`). I'm sure they're not all needed. For example, in: `if(/(.{1,8})\S:\S/){` the last `\S` accomplishes nothing since it isn't captured.	[reply] [d/l] [select]
Re^2: Speed up hash initialization loop by austinj (Acolyte) on Jan 30, 2013 at 14:12 UTC
Thanks for the advice, I'm pretty sure the regex runs pretty quick - the reason being if I only run a couple profiles (20) the whole routine runs at 0.07 seconds per profile. However if I run a large number (300+) it slows down to ~1.0 seconds per profile (average). I assume this means that something with initializing/re-allocating memory to the hash is what is slowing me down, not the regex. Either way I took your advice and removed some unnecessary parts of the regex. But it still runs at approx the same speed.	[reply]
Re^2: Speed up hash initialization loop by austinj (Acolyte) on Jan 30, 2013 at 15:13 UTC
I just ran one more test, I ran 100 profiles, average approx 1 second per profile. I then ran 3 profiles (that were in the 100 set) and specifically those 3 had taken 2+ seconds to run. Now with only 3 they each ran in 0.07 seconds or less. I'm not sure why it seems to already know it has a lot of profiles ... unless, I'm passing the profiles in on the command line example /home/profile_* , maybe it has to run this "ls" type command every time? it seems it should only run it once, but I'm not sure I set it up that way. I'm pulling them in like this: `my $arg_profs = \@ARGV; # set the remaining arguments AFTER you have read the template`	[reply] [d/l]
Re: Speed up hash initialization loop by bulk88 (Priest) on Jan 30, 2013 at 03:12 UTC
write $profEvntVar->{$customProf} only once, assign the ref to a lexical, then deref the lexical each time. You cut 2 lookups to 1 on each line.	[reply]
Re^2: Speed up hash initialization loop by roboticus (Chancellor) on Jan 30, 2013 at 13:46 UTC
bulk88: So the compiler doesn't do a 'common subexpression elimiation' optimization? Or does it do such a thing, but it can't optimize that due to the possibility of too much "magic" going on? ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^3: Speed up hash initialization loop by bulk88 (Priest) on Jan 30, 2013 at 21:25 UTC
Correct. $profEvntVar may be magical and return a different hash every time its read. In that hash, slice {$customProf} might be magical and different every time. If you write the var X many times in source, it will be called/read X many times in source. There is no caching.	[reply]
Re^3: Speed up hash initialization loop by austinj (Acolyte) on Jan 30, 2013 at 14:33 UTC
I'm not using a compiler... actually I didn't even know there where perl compilers, but if you point me in the right direction I'd be happy to learn	[reply]
Re^2: Speed up hash initialization loop by austinj (Acolyte) on Jan 30, 2013 at 14:34 UTC
This sounds like what I need to do but I don't understand where to deref. Should this be within the subroutine? If it is outside of the subroutine, I think I'm already doing this, my %profEvntVar is defined above, and in the end I need it to contain all of the info about each profile in @profiles Sorry if I'm missing something	[reply]
Re^3: Speed up hash initialization loop by Anonymous Monk on Jan 31, 2013 at 20:44 UTC
`if(exists $profEvntVar->{$customProf}){next;} my $cprof = $profEvntVar->{$customProf} = {}; my ($evnt_nums_ref,$var_names_ref,$variables_ref,$last_evn +t_name) = getvars($customProf); $cprof->{events} = $evnt_nums_ref; #Link profiles with th +eir respective events $cprof->{times} = $evnt_times_ref; #Link profiles with th +eir respective events # etc` [download] FWIW, I don't think there'll be much gain (it should be in the order of microseconds), but since they're in a loop it might add up. This is not really an optimisation technique, but a code cleanup one. (Do change the naming of `$cprof` if you can think of a better name.)	[reply] [d/l] [select]
Re: Speed up hash initialization loop by clueless newbie (Curate) on Jan 30, 2013 at 17:42 UTC
May a clueless newbie suggest Bunce and company's Devel::NYTProf?	[reply]

Back to Seekers of Perl Wisdom