Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Can this code be optimized further?

by samy_kumar (Scribe)
on Feb 10, 2005 at 13:52 UTC ( #429714=perlquestion: print w/replies, xml ) Need Help??

samy_kumar has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl -w use strict; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my (@a, @b) ; foreach my $value (@temp) { push @a, $1 if ($value =~ /a_(.*)/) ; push @b, $1 if ($value =~ /b_(.*)/) ; } print "@a <==> @b";
I have an array with some elements. My requirement is that i need to remove the characters "a_" & "b_" and push those values to the corresponding arrays @a & @b. I end up with the below code. Is there any way to optimize the above code / do it in a better way?

Replies are listed 'Best First'.
Re: Can this code be optimized further?
by dragonchild (Archbishop) on Feb 10, 2005 at 13:56 UTC
    There's a few improvements that can be made.
    use strict; my @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); my %values; foreach (@temp) { /^([ab])_(.*)/ && do { push @{$values{$1}}, $2; next; }; die "'$_' doesn't match the pattern of ^[ab]_.*\n"; } print "$_ => @{$values{$_}}\n" for qw( a b );

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Can this code be optimized further?
by eXile (Priest) on Feb 10, 2005 at 14:33 UTC
    why not use only grep?:
    my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my @a_arr = grep { s/^a_(.*)/$1/} @temp; my @b_arr = grep { s/^b_(.*)/$1/} @temp;
    PS: using @a and @b should be OK, but $a and $b have special meaning (in 'sort'), so I try to never use 'a' and 'b' as variable names.
Re: Can this code be optimized further?
by BrowserUk (Patriarch) on Feb 10, 2005 at 14:53 UTC

    #!/usr/bin/perl -w use strict; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my (@a, @b) ; m[^([ab])_(.*)$] and push @{$1 eq 'a' ? \@a : \@b}, $2 for @temp; print "@a <==> @b";

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Can this code be optimized further?
by phaylon (Curate) on Feb 10, 2005 at 14:41 UTC
    This works w/o pcre's and more than a and b, though it stores in a hash.
    #!/usr/bin/perl use warnings; use strict; my @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); my %parts; push @{ $parts{ substr $_, 0, 1 }}, substr $_, 2 foreach @temp; use Data::Dumper; print Dumper \%parts;
      This works w/o pcre's

      I assume that by "pcre" you mean Perl Compatible Regular Expressions. Now -- putting aside all questions of how compatible those regexes really are -- I'd like to mention that when we're using Perl, we don't have to call them "Perl Compatible". We just call them "Regular Expressions". ;-P

        Yep, but I often talk to people not involved, so it's just a habit. Besides, I find it more aesthetic than "re's" or "RegExp's" or something :)
Re: Can this code be optimized further?
by Fletch (Bishop) on Feb 10, 2005 at 14:06 UTC

    Another way.

    ## I'd personally use qw( ) here, but this is probalby just sample dat +a my @temp = ("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4 +"); my( @a, @b ); my %arrays; eval qq{\$arrays{ "$_" } = \\\@$_} for qw( a b ); for( @temp ) { die "Invalid prefix on element '$_'\n" unless /^([ab])_(.*)/; push @{ $arrays{ $1 } }, $2; }

    Update: Gah, left off a couple of \\ to get a reference in the eval.

      eval qq{\$arrays{ "$_" } = \\\@$_} for qw( a b ); ?!?!?

      Am I the only one that thinks unncessary evals are just ugly? While I commend you for trying to keep the surrounding code unchanged (by creating @a and @b), I think the better solution is to improve the surrounding code by making sure the appropriate data structures are used - in this case, a HoA is appropriate.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        If you can think of a better way to get a reference to an arbitrary lexical without eval sure. The original question was how to get things into arrays @a and @b; this gets things into the lexical arrays.

        Granted they may really need a HoA to begin with, but this technique (or the equivalent $arrays{ $_ } = do { no strict 'refs'; \@{ $_ } } for a package variable) sometimes is useful.

      You can use a hash to reroute things to the lexical arrays so you don't have to eval:
      my (@a, @b); my %router = (a => \@a, b => \@b); foreach my $value (@temp) { push @{$router{$1}}, $2 if $value =~ /([ab])_(.*)/; } print "@a <==> @b";

      Caution: Contents may have been coded under pressure.

        And if you had 10 arrays you'd want to keep that hash up to date? 20? 30?

        (Of course not, you'd use a HoA and not screw with this in the first place. :)

        That being said, in this specific case of just a and b I'd personally have probably used something like BrowserUK's @{ $1 eq 'a' ? \@a : \@b} below; but the eval's more flexible if you have a large number of destinations (then again with a large number of destinations you'd probably want an HoA). Of course at the moment I'd be more likely to just use Enumerable#partition and be done with it, but that's another language all together.

Re: Can this code be optimized further?
by Anonymous Monk on Feb 10, 2005 at 14:23 UTC
    I wouldn't use push. If @temp is large (which assume it is, if it isn't there's no need to bother with optimizing it), repeated pushes means repeated mallocing the array needed for @a and @b as it extends.

    I'd use:

    my @temp = (...); my @a = map {/a_(.*)/ ? $1 : ()} @temp; my @b = map {/b_(.*)/ ? $1 : ()} @temp;
    If you expect to not have many matches, that is, most elements in @temp don't contain a_ or b_, I'd try to see whether the simpler regex is enough gain to have both the map and the grep:
    my @a = map {/a_(.*)/; $1} grep /a_/ @temp; my @b = map {/b_(.*)/; $1} grep /b_/ @temp;
Re: Can this code be optimized further?
by cog (Parson) on Feb 10, 2005 at 14:49 UTC
    Use the Benchmark module to try out all these answers :-)
Re: Can this code be optimized further?
by rir (Vicar) on Feb 10, 2005 at 20:22 UTC
    my ( @a, @b); { no strict "refs"; push @{substr( $_,0,1)}, substr($_,2) foreach ( @temp) }
    There is not much point in avoiding sym-refs here.

    Be well,
    rir

Re: Can this code be optimized further?
by runrig (Abbot) on Feb 10, 2005 at 16:58 UTC
    As long as everyone's just chiming in with different ways, if you don't mind modifying the original array:
    for (@temp) { push @a, $_ if s/^a_//; push @b, $_ if s/^b_//; }
    You could even modify this to use a HoA-type solution, e.g.(update: fixed code..still untested):
    my %hoa; s/^([a-z])_// and push @{$hoa{$1}}, $_ for @temp;
Re: Can this code be optimized further?
by holli (Abbot) on Feb 10, 2005 at 18:56 UTC
    This is not an optimization, but for the sake of TIMTOWTDI:
    #!/usr/bin/perl -w use strict; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my (@a, @b) ; my %dispatch = ( "a" => sub { push @a, shift }, "b" => sub { push @b, shift }, ); foreach my $value (@temp) { if ( $value =~ /^([ab])_(.+)$/ ) { my $sub = $dispatch{$1}; &$sub($2); } } print "@a <==> @b\n";
    holli, /regexed monk/
Re: Can this code be optimized further?
by RazorbladeBidet (Friar) on Feb 10, 2005 at 15:23 UTC
    #!/usr/bin/perl -w use strict; use Time::HiRes qw( gettimeofday tv_interval); my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my @startTime = gettimeofday; for ( 1..10000 ) { my (@a, @b) ; foreach my $value (@temp) { push @a, $1 if ($value =~ /a_(.*)/) ; push @b, $1 if ($value =~ /b_(.*)/) ; } } print "Time: ".tv_interval( \@startTime )."\n"; @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); @startTime = gettimeofday; for ( 1..10000 ) { my %values; foreach (@temp) { /^([ab])_(.*)/ && do { push @{$values{$1}}, $2; next; }; die "'$_' doesn't match the pattern of ^[ab]_.*\n"; } } print "Time: ".tv_interval( \@startTime )."\n"; @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); @startTime = gettimeofday; for ( 1..10000 ) { my (@a, @b); foreach (@temp) { my $prefix = substr( $_, 0, 2 ); if ( $prefix eq 'a_' ) { push @a, substr( $_, 2 ); } elsif ( $prefix eq 'b_' ) { push @b, substr( $_, 2 ); } } } print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my( @a, @b ); my %arrays; eval qq{\$arrays{ "$_" } = \\\@$_} for qw( a b ); for( @temp ) { die "Invalid prefix on element '$_'\n" unless /^([ab])_(.*)/; push @{ $arrays{ $1 } }, $2; } } print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my @a = map {/a_(.*)/ ? $1 : ()} @temp; my @b = map {/b_(.*)/ ? $1 : ()} @temp; } print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my @a_arr = grep { s/^a_(.*)/$1/} @temp; my @b_arr = grep { s/^b_(.*)/$1/} @temp; } print "Time: ".tv_interval( \@startTime )."\n"; # GAVE ME substr outside of string at ./testme2.pl line 88. #@startTime = gettimeofday; #for ( 1..10000 ) { # my %parts; # push @{ $parts{ substr $_, 0, 1 }}, substr $_, 2 # foreach @temp; #} #print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my (@a, @b) ; m[^([ab])_(.*)$] and push @{$1 eq 'a' ? \@a : \@b}, $2 for @temp; } print "Time: ".tv_interval( \@startTime )."\n"; Output: Time: 0.575739 Time: 0.717803 Time: 0.46604 Time: 1.731715 Time: 0.777471 Time: 0.462416 Time: 0.103439
    I'd go with BrowserUK's code :)


    Update!!!:

    Somewhere along the way the @temp was being affected, hence the warnings on phaylon's code.

    Here's the updated output w/ mine removed and phaylon's added (also a little more descriptive):

    Baseline Time: 0.584196 Regexp w/ Hash Time: 0.7385 Eval Time: 1.774581 Map Time: 0.76179 Grep Time: 0.456991 Hash w/ substr Time: 0.411059 Regexp w/ eq Time: 0.683154 Router Time: 0.668033


    Hence, the HoA using substrings would be the most flexible and quickest (albeit a bit risky). Grep is a close second with the eval being 3x as long.

      You should use the benchmark module to do benchmarking. Hopefully I got these right.

      #!/usr/bin/perl -w use strict; use Benchmark qw/cmpthese/; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; # samy_kumar sub original { my (@a, @b) ; foreach my $value (@temp) { push @a, $1 if ($value =~ /a_(.*)/) ; push @b, $1 if ($value =~ /b_(.*)/) ; } } # dragonchild sub hash_style { my %values; foreach (@temp) { /^([ab])_(.*)/ && do { push @{$values{$1}}, $2; next; }; } } # roy jonhson sub router_style { my (@a, @b); my %router = (a => \@a, b => \@b); foreach my $value (@temp) { push @{$router{$1}}, $2 if $value =~ /([ab])_(.*)/; } } sub switch_style { my (@a, @b); foreach (@temp) { my $prefix = substr( $_, 0, 2 ); if ( $prefix eq 'a_' ) { push @a, substr( $_, 2 ); } elsif ( $prefix eq 'b_' ) { push @b, substr( $_, 2 ); } } } sub map_style { my @a = map {/a_(.*)/ ? $1 : ()} @temp; my @b = map {/b_(.*)/ ? $1 : ()} @temp; } sub grep_style { my @a_arr = grep { s/^a_(.*)/$1/} @temp; my @b_arr = grep { s/^b_(.*)/$1/} @temp; } sub grep_map_style { my @a = map {/a_(.*)/; $1} grep /a_/, @temp; my @b = map {/b_(.*)/; $1} grep /b_/, @temp; } sub trinary { my (@a, @b) ; m[^([ab])_(.*)$] and push @{$1 eq 'a' ? \@a : \@b}, $2 for @temp; } cmpthese( 100_000, { "Original" => \&original, "Hash" => \&hash_style, "Router" => \&router_style, "Switch" => \&switch_style, "Map" => \&map_style, "Grep" => \&grep_style, "Grep + Map" => \&grep_map_style, "Trinary" => \&trinary }); __DATA__ C:\test>perl 429768.pl Rate Grep Switch Map Router Original Grep + Map Ha +sh Trinary Grep 32489/s -- -67% -70% -78% -81% -83% -8 +5% -87% Switch 98425/s 203% -- -9% -34% -42% -49% -5 +5% -60% Map 108460/s 234% 10% -- -27% -36% -44% -5 +1% -56% Router 149031/s 359% 51% 37% -- -12% -23% -3 +2% -40% Original 168634/s 419% 71% 55% 13% -- -13% -2 +3% -32% Grep + Map 194175/s 498% 97% 79% 30% 15% -- -1 +2% -21% Hash 220264/s 578% 124% 103% 48% 31% 13% +-- -11% Trinary 246914/s 660% 151% 128% 66% 46% 27% 1 +2% -- C:\test>perl 429768.pl Rate Grep Switch Map Router Original Grep + Map Ha +sh Trinary Grep 32658/s -- -66% -69% -78% -81% -83% -8 +5% -86% Switch 95602/s 193% -- -10% -36% -43% -49% -5 +5% -60% Map 106610/s 226% 12% -- -28% -37% -43% -5 +0% -55% Router 148810/s 356% 56% 40% -- -12% -21% -3 +0% -37% Original 168350/s 415% 76% 58% 13% -- -11% -2 +1% -29% Grep + Map 188324/s 477% 97% 77% 27% 12% -- -1 +2% -21% Hash 213220/s 553% 123% 100% 43% 27% 13% +-- -10% Trinary 236967/s 626% 148% 122% 59% 41% 26% 1 +1% -- C:\test>perl 429768.pl Rate Grep Switch Map Router Original Grep + Map Ha +sh Trinary Grep 32819/s -- -66% -69% -78% -81% -82% -8 +6% -87% Switch 96899/s 195% -- -9% -35% -42% -47% -5 +8% -61% Map 106724/s 225% 10% -- -28% -37% -42% -5 +3% -57% Router 148810/s 353% 54% 39% -- -12% -19% -3 +5% -40% Original 168350/s 413% 74% 58% 13% -- -8% -2 +6% -32% Grep + Map 182815/s 457% 89% 71% 23% 9% -- -2 +0% -26% Hash 228833/s 597% 136% 114% 54% 36% 25% +-- -7% Trinary 246305/s 650% 154% 131% 66% 46% 35% +8% -- C:\test>

      ___________
      Eric Hodges
        I added a testing mode to the code, so you can verify that each sub yields proper results. When running, if you pass a n argument to the program, it will run a test instead of a benchmark. Benchmark runs for three seconds instead of 100_000 iterations, now.

        I also tweaked a few of the routines, which made substantial differences in their runtimes. And I dumped a couple of uninteresting subs. The code is in the readmore.

        My results:
        Rate Grep Trinary Hash Map Original New_Map New_Grep Tr +i_Substr Switch Tri_Substr2 Grep 969/s -- -30% -33% -37% -43% -50% -56% + -68% -71% -76% Trinary 1379/s 42% -- -4% -10% -19% -29% -37% + -55% -59% -66% Hash 1442/s 49% 5% -- -6% -15% -26% -34% + -53% -57% -64% Map 1528/s 58% 11% 6% -- -10% -21% -31% + -50% -54% -62% Original 1702/s 76% 23% 18% 11% -- -12% -23% + -44% -49% -57% New_Map 1937/s 100% 40% 34% 27% 14% -- -12% + -36% -42% -52% New_Grep 2202/s 127% 60% 53% 44% 29% 14% -- + -28% -34% -45% Tri_Substr 3050/s 215% 121% 111% 100% 79% 57% 39% + -- -9% -24% Switch 3353/s 246% 143% 132% 119% 97% 73% 52% + 10% -- -16% Tri_Substr2 4000/s 313% 190% 177% 162% 135% 106% 82% + 31% 19% --
        Tri_Substr2 is effectively a synthesis of switch and trinary. You can see the differences little design changes make. Using references slows things down. Matching and substituting back in instead of just deleting makes a big difference for grep vs. new_grep.

        Caution: Contents may have been coded under pressure.
        Redefine @temp before each algorithm or pass it in as an argument.

        I see that Benchmark has a different output format, but the results should be close to the same, no?
Re: Can this code be optimized further?
by TedPride (Priest) on Feb 10, 2005 at 16:15 UTC
    I don't think regex is needed here, and you really should make the code somewhat more forgiving to data variations. What if you use all letters of the alphabet instead of just a and b?
    use strict; use warnings; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my ($key, $val, %hash); for (@temp) { ($key, $val) = split '_'; if (exists($hash{$key})) { push @{$hash{$key}}, $val; } else { $hash{$key} = [$val]; } } for (sort keys %hash) { print "$_: " . join(' ', @{$hash{$_}}) . "\n"; }
    You could probably optimize this further by allocating the entire expected space for each array when initially created, but I'll leave that for the next guy.
Re: Can this code be optimized further?
by jdporter (Chancellor) on Feb 10, 2005 at 20:58 UTC
    This solution is more golfish than fast. (Anyone care to benchmark it?)
    eval join '', map /(.*)_(.*)/ ? "push \@$1,'$2';" : (), @temp;
    Here's another which is shorter but perhaps slower, since it does many evals instead of one:
    /(.*)_(.*)/ && eval "push \@$1,'$2'" for @temp;
Re: Can this code be optimized further?
by ikegami (Patriarch) on Feb 10, 2005 at 18:23 UTC

    Another way that works from a string instead of from an array:

    use strict; use warnings; my @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); local $_ = join(' ', @temp); my @a = /a_(\d+)/g; # or: my @a = /\ba_(\d+)\b/g; my @b = /b_(\d+)/g; # or: my @b = /\bb_(\d+)\b/g; $, = ", "; $\ = "\n"; print(@a); print(@b); __END__ output ====== 1, 2, 3, 4, 5, 6 1, 2, 3, 4

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://429714]
Approved by friedo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2022-06-27 13:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (88 votes). Check out past polls.

    Notices?