http://www.perlmonks.org?node_id=1015963


in reply to Speed up hash initialization loop

How many iterations of the initial "foreach" loop are you doing (i.e. how many "profiles" are there)? Is it the same set of 300+ files that the "getvars" sub is loading on each iteration, or does each "profile" bring in its own set of distinct files?

If it's the same 300 files each time, you might see a big difference if you figure out how to restructure the looping so that you read each file exactly once, and populate all the profiles in that one pass over each file. But I'm only guessing, because you haven't provided enough info about the problem (number of profiles, total amount of data in the files, what manner of "system_command" are you running for each file).

Apart from that, anything you do to simplify the "getvars" code will help some; e.g.:

- don't use references to hashes and arrays when you don't need to ("prof_var_names" and "evnt_nums_ref" should just be plain hashes; you can return them as refs the same way you do "sorted_vars", and "vars" should just be @vars).

- use a "pipe open" to run your system command, read from the pipe until you see /^Events$/, then read the data of interest - i.e.:

sub getvars { my $profile = shift; my ( @vars, %evnt_nums, %prof_var_names, $last_evnt_name ); open( my $ptk_info, '-|', "system command here" ) or die "$profile +: $!\n"; while (<$ptk_info>) { last if ( /^Events$/ ); # skip lines till this line is found } while (<$ptk_info>) { my @tkns = split; if ( $tkns[0] =~ /^(\d*\.\d*)0/ ) { $last_evnt_name = $tkns[2]; $evnt_nums{$last_evnt_name} = $1; } push @vars, @tkns; } ... # (do other for loop, sort @vars return \%event_nums, \%prof_var_names, \@sorted_vars, $last_evnt_n +ame; }

Replies are listed 'Best First'.
Re^2: Speed up hash initialization loop
by austinj (Acolyte) on Jan 30, 2013 at 15:09 UTC

    There are 300 profiles, getvars only sees each of the profiles once (it takes a profile location as an argument) I am checking to make sure I haven't already ran the profile so as to not run it twice. I switched to the pipe open as suggested (no significant change in runtime) I also changed all hash/array refs to standard hashes and returned the refs as suggested (again no significant runtime change) The files themselves are relatively small and the system command returns approx 30 lines of text which I use in the regex Thanks for the help

      How large will @$vars be? If larger than a hundred elements or so, you will benefit from a Schwartzian transform:

      my @sorted_vars = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [ lc $_, $_ ] } @{$vars};

      I'm afraid I'm as out of ideas as the other posters here -- your only recourse is to use a profiler and find the bottlenecks that way.