Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Speeding up a Perl code. Can it be faster?

by $h4X4_|=73}{ (Monk)
on Jun 10, 2016 at 12:01 UTC ( [id://1165279]=perlquestion: print w/replies, xml ) Need Help??

$h4X4_|=73}{ has asked for the wisdom of the Perl Monks concerning the following question:

I have a small section of code that works. But I would like it to be faster. The size of the data will stay around its current size to maybe four times that. If that is any help in optimizing the code. Here is the working code in a Benchmark test.

use strict; use warnings; use Data::Dumper; use Benchmark; # verification pattern is used by the developer to set a range # in this case the first letter before the slash is to verify # the attribute name in $attrs. The content after the slash # is used to verify the attribute was not used yet. but in the full # version ere is a range that goes to a function to verify the # attribute values. my $match_list = 'a/ere,c/ere,d/ere'; # in $attrs the letter before equal is the attribute name # and needs to be compared with $match_list for its value to be replac +ed. # all attribute names can only be used one time in $attrs to be good # the value of the attribute may have a space in it. my $attrs = 'TY a=errt c=rrr dd d=rrr dd'; # example that should conve +rt #my $attrs = 'TY a=errt c=rrr dd d=rrr d=du 0'; # example that should +say "nope" # tag name, just something that is there. my $tag = 'TY'; # markup is the template of the output if its good my $markup = 'TY:X{a} X{c}|X{d}'; # few more examples based off some of your posts sub third_draft { my %xlist = map{ split('\/', $_) } split(',', $match_list); $attrs =~ s/\A$tag //; # clean up for (split (' (?=\w+\=)', $attrs)) { my ($name, $value) = split('='); # only one attribute of the same name is allowed. if (defined $xlist{$name}) { $xlist{$name} = undef; # this exists now $markup =~ s/X\{$name\}/$value/g; } else { # fails get nothing and end. $markup = ''; last; } } # not in Benchmark # $markup # ? print $markup # : print 'nope'; } sub second_draft { my %xlist = map{ split(/\//, $_) } # make list split(/,/, $match_list); $attrs =~ s/\A$tag //; # clean up my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g; # make arra +y for xlist for (@attr) { my ($name, $value) = split(/=/,$_); # only one attribute of the same name is allowed. if (exists $xlist{$name} && defined $xlist{$name}) { $xlist{$name} = undef; # this exists now $markup =~ s/X\{$name\}/$value/g; } else { # fails get nothing and end. $markup = ''; last; } } # not in Benchmark # $markup # ? print $markup # : print 'nope'; } sub first_draft { my %xlist = map{ split(/\//, $_, 2) } # make list grep{ m/\// } split(/,/, $match_list); $attrs =~ s/\A$tag //; # clean up my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g; # make arra +y for xlist for (@attr) { my ($name, $value) = split(/=/,$_); # only one attribute of the same name is allowed. if (exists $xlist{$name} && defined $xlist{$name}) { $xlist{$name} = undef; # this exists now $markup =~ s/X\{$name\}/$value/g; } else { # fails get nothing and end. $markup = ''; last; } } # not in Benchmark # $markup # ? print $markup # : print 'nope'; } timethese($ARGV[0] || -1, { 1 ? ( 'a_first_draft' => \&first_draft, ) : (), 1 ? ( 'b_second_draft' => \&second_draft, ) : (), 1 ? ( 'c_third_draft' => \&third_draft, ) : (), });
Thank you in advance.

Update: Added better comments in code.

Replies are listed 'Best First'.
Re: Speeding up a Perl code. Can it be faster?
by Athanasius (Archbishop) on Jun 10, 2016 at 12:49 UTC

    Hello $h4X4_|=73}{,

    These lines look strange to me:

    if (exists $xlist{$name} && defined $xlist{$name}) { $xlist{$name} = undef; # this exists now

    However, if the logic is correct, you can gain a little speedup by removing the first test:

    if (defined $xlist{$name}) {

    Testing for definedness on a non-existent hash entry doesn’t raise a warning, and won’t trigger autovivification, as the following demonstrates:

    22:44 >perl -Mstrict -MData::Dump -wE "my %h = (a => 1, b => undef); d +d \%h; say $_, defined $h{$_} ? ': yes' : ': no' for qw(a b c); dd \% +h;" { a => 1, b => undef } a: yes b: no c: no { a => 1, b => undef } 22:47 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Speeding up a Perl code. Can it be faster?
by choroba (Cardinal) on Jun 10, 2016 at 13:07 UTC
    You didn't provide enough testing data for the border cases. When doing several replacements on the same string, it's usually faster to store the regexes and replacements in a hash, and run just one replacement. It's not clear whether it's possible in your case, but it works for the simple input you provided:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Benchmark qw{ cmpthese }; my $match_list = 'a/ere,c/ere'; my $markup = 'TY:X{a} X{c}'; my $attrs = 'TY a=errt c=rrrdd'; my $tag = 'TY'; sub first_draft { my %xlist = map{ split(/\//, $_, 2) } grep{ m/\// } split(/,/, $match_list); $attrs =~ s/\A$tag //; my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g; for (@attr) { my ($name, $value) = split(/=/,$_); if (exists $xlist{$name} && defined $xlist{$name}) { $xlist{$name} = undef; $markup =~ s/X\{$name\}/$value/g; } else { $markup = ''; last; } } $markup } sub choroba { my %xlist = map split(m=/=, $_, 2), grep m=/=, split /,/, $match_l +ist; $attrs =~ s/\A$tag //; my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g; my %replace = map split /=/, @attr; $markup =~ s/X\{(.*)\}/$replace{$1}/g; $markup } use Test::More; is first_draft(), choroba(), 'same'; done_testing(1); warn Dumper(first_draft()); cmpthese(shift || -1, { 'first_draft' => \&first_draft, 'choroba' => \&choroba, });

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Speeding up a Perl code. Can it be faster?
by hippo (Bishop) on Jun 10, 2016 at 12:56 UTC

    This runs about 20% faster for me, YMMV:

    use strict; use warnings; use Benchmark; my $match_list = 'a/ere,c/ere'; # verification pattern my $markup = 'TY:X{a} X{c}'; # output markup my $attrs = 'TY a=errt c=rrrdd'; # info to check my $tag = 'TY'; # tag name sub first_draft { my %xlist = map{ split('/', $_, 2) } # make list grep{ m'/' } split(',', $match_list); $attrs =~ s/\A$tag //; # clean up for (split (' ', $attrs)) { my ($name, $value) = split('='); # only one attribute of the same name is allowed. if (defined $xlist{$name}) { $xlist{$name} = undef; # this exists now $markup =~ s/X{$name}/$value/g; } else { # fails get nothing and end. $markup = ''; last; } } # not in Benchmark # $markup # ? print $markup # : print 'nope'; } timethese($ARGV[0] || -1, { 1 ? ( 'first_draft' => \&first_draft, ) : (), });

    I would expect there to be a more efficient way to set up %xlist but it hasn't occurred to me yet precisely how.

Re: Speeding up a Perl code. Can it be faster?
by davido (Cardinal) on Jun 10, 2016 at 14:38 UTC

    Install the Perl module from CPAN, Devel::NYTProf, write a test script that invokes this subroutine, and then profile it. I would not suggest using NYTProf on code that also uses Benchmark -- just write a script that invokes the subroutine a few times with varying but realistic data and run it through the profiler.

    By doing this you'll gain a better appreciation for where the hotspots are. Without doing this, we really have to just use intuition, which is not as reliable as real data.


    Dave

Re: Speeding up a Perl code. Can it be faster?
by stevieb (Canon) on Jun 10, 2016 at 12:45 UTC

    It would probably prove useful if you added some description of what your code does, and in what context you use it. I know it can be gleaned from studying the code, but if readers had this info up-front, it makes digesting the code easier.

Re: Speeding up a Perl code. Can it be faster?
by oiskuu (Hermit) on Jun 10, 2016 at 15:42 UTC

    It'd be helpful to know more of the specifics. For example, if the keys are limited to alphanumeric labels, the simplification could provide further speedup. Anyway, here's one regex-convoluted aka "adding problems to problems" version for your perusal:

    sub uhuu { my $keys = join "|", map "\Q$_\E", ($match_list =~ m{([^/,]+)/[^,] +*}g); my %R; return unless $attrs =~ m{ \A \Q$tag\E \s+ (?: ($keys)=(\S+) (?(?{ defined $R{$1} || ($R{$1}=$2, 0) })(?! +)) \s* )*+ \z }x; $markup =~ s[X{($keys)}][ $R{$1} // $& ]ger }

    A side-note to benchmarkers: localize the globals that you modify. ("On the importance of testing your testing.")

Re: Speeding up a Perl code. Can it be faster?
by $h4X4_&#124;=73}{ (Monk) on Jun 11, 2016 at 10:45 UTC

    Thank you all for help and showing me more ways to do the same thing faster.
    From tweaking this small code I managed to speed it up about three times faster, just moving some stuff around.

    sub forth_draft { my %xlist = map{ split('\/', $_) } split(',', $match_list); $attrs =~ s/\A$tag //; # clean up for (split (' (?=\w+\=)', $attrs)) { my ($name, $value) = split('='); # little more golfing action (defined $xlist{$name} && $markup =~ s/X\{$name\}/$value/g) ? $xlist{$name} = undef # this exists now : $markup = '' if $markup; # do this and let it run through, fas +ter. last if !$markup; # with out this it drops speed now? } # not in Benchmark # $markup # ? print $markup # : print 'nope'; }
    Test output:
    Benchmark: running a_first_draft, b_second_draft, c_third_draft, d_for +th_draft for at least 1 CPU seconds... a_first_draft: 1 wallclock secs ( 1.05 usr + 0.00 sys = 1.05 CPU) @ + 29132.76/s (n=30502) b_second_draft: 1 wallclock secs ( 1.09 usr + 0.00 sys = 1.09 CPU) +@ 31449.73/s (n=34406) c_third_draft: 1 wallclock secs ( 1.11 usr + 0.00 sys = 1.11 CPU) @ + 35255.18/s (n=39098) d_forth_draft: 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ + 115597.37/s (n=122880)

      This is 6x faster than your original and 50% faster than your latest (and a whole lot more readable to boot):

      use strict; use warnings; use Data::Dump qw[ pp ]; use Benchmark qw[ cmpthese ]; my $match_list = 'a/ere,b/ere,c/ere,d/ere,e/ere,f/ere,g/ere,h/ere'; my @markup = ( 'TY:' . 'X{a} X{b} X{c} X{d} X{e} X{f} X{g} X{h} + ' x 100 ) x 3; my @attrs = ( 'TY a=alpha b=bravo c=charlie d=delta e=echo f=fox +trot g=golf h=hotel' ) x 3; my $tag = 'TY'; my $iters = $ARGV[ 0 ] // -1; sub buk { my %xlist = map split( m'/', $_ ), split ',', $match_list; $attrs[ 0 ] =~ s[^$tag ][]; for( split m' ', $attrs[ 0 ] ) { my( $name, $value ) = split '='; ( defined $xlist{ $name } && $markup[ 0 ] =~ s[X\{$name\}][$va +lue]g ) or $markup[ 0 ] = '', last; undef $xlist{ $name }; } $iters == 1 and print "0:$markup[ 0 ]\n" } sub first_draft { my %xlist = map{ split m[/], $_, 2 } grep{ m[/] } split ',', $mat +ch_list; $attrs[ 1 ] =~ s[\A$tag ][]; # clean up my @attr = $attrs[ 1 ] =~ m[(?:\A| )(.+?)(?=(?: \w+=|\z))]g; for( @attr ) { my( $name, $value ) = split '=', $_; # only one attribute of the same name is allowed. if( exists $xlist{ $name } && defined $xlist{ $name } ) { $xlist{ $name } = undef; # this exists now $markup[ 1 ] =~ s[X\{$name\}][$value]g; } else { # fails get nothing and end. $markup[ 1 ] = ''; last; } } $iters == 1 and print "1:$markup[ 1 ]\n" } sub forth_draft { my %xlist = map{ split('\/', $_) } split(',', $match_list); $attrs[ 2 ] =~ s/\A$tag //; # clean up for (split (' (?=\w+\=)', $attrs[ 2 ])) { my ($name, $value) = split('='); # little more golfing action (defined $xlist{$name} && $markup[ 2 ] =~ s/X\{$name\}/$value/g) ? $xlist{$name} = undef # this exists now : $markup[ 2 ] = '' if $markup[ 2 ]; # do this and let it run th +rough, faster. last if !$markup[ 2 ]; # with out this it drops speed now? } $iters == 1 and print "2:$markup[ 2 ]\n" } cmpthese( $iters, { 'first_draft' => \&first_draft, 'forth_draft' => \&forth_draft, buk => \&buk, } ); __END__ C:\test>junk -1 Rate first_draft forth_draft buk first_draft 4640/s -- -79% -86% forth_draft 22226/s 379% -- -33% buk 33339/s 619% 50% --

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

      Your first_draft() and forth_draft() produce differing results when an attribute is missing from $markup.

      At this point, the exercise appears as one of futility. We don't know what the problem is, which parameters are global, which are variable. We don't know what the inputs are to the function, or even if it's a function that you need in the first place.

      Please concentrate on the correctness before tackling the overhead. Please expound on the actual problem before coding up a solution. Maybe we can comment on that and offer alternatives.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1165279]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-04-18 06:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found