Speeding up a Perl code. Can it be faster?

$h4X4_|=73}{ has asked for the wisdom of the Perl Monks concerning the following question:

I have a small section of code that works. But I would like it to be faster. The size of the data will stay around its current size to maybe four times that. If that is any help in optimizing the code. Here is the working code in a Benchmark test.

use strict;
use warnings;
use Data::Dumper;
use Benchmark;
# verification pattern is used by the developer to set a range
# in this case the first letter before the slash is to verify
# the attribute name in $attrs. The content after the slash
# is used to verify the attribute was not used yet. but in the full
# version ere is a range that goes to a function to verify the
# attribute values.
my $match_list = 'a/ere,c/ere,d/ere';

# in $attrs the letter before equal is the attribute name
# and needs to be compared with $match_list for its value to be replac
+ed.
# all attribute names can only be used one time in $attrs to be good
# the value of the attribute may have a space in it.
my $attrs = 'TY a=errt c=rrr dd d=rrr dd'; # example that should conve
+rt
#my $attrs = 'TY a=errt c=rrr dd d=rrr d=du 0'; # example that should 
+say "nope"

# tag name, just something that is there.
my $tag = 'TY';

# markup is the template of the output if its good
my $markup = 'TY:X{a} X{c}|X{d}';

# few more examples based off some of your posts
sub third_draft {
    my %xlist = map{ split('\/', $_) } split(',', $match_list);

    $attrs =~ s/\A$tag //; # clean up
    for (split (' (?=\w+\=)', $attrs)) {
     my ($name, $value) = split('=');
     # only one attribute of the same name is allowed.
     if (defined $xlist{$name}) {
      $xlist{$name} = undef; # this exists now
      $markup =~ s/X\{$name\}/$value/g;
     }
       else { # fails get nothing and end.
        $markup = '';
        last;
       }
     }

     # not in Benchmark
#     $markup
#     ? print $markup
#     : print 'nope';
}

sub second_draft {
    my %xlist = map{ split(/\//, $_) } # make list
    split(/,/, $match_list);

    $attrs =~ s/\A$tag //; # clean up
    my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g; # make arra
+y for xlist
    for (@attr) {
     my ($name, $value) = split(/=/,$_);
     # only one attribute of the same name is allowed.
     if (exists $xlist{$name} && defined $xlist{$name}) {
      $xlist{$name} = undef; # this exists now
      $markup =~ s/X\{$name\}/$value/g;
     }
      else { # fails get nothing and end.
       $markup = '';
       last;
       }
     }
     # not in Benchmark
#     $markup
#     ? print $markup
#     : print 'nope';
}

sub first_draft {
    my %xlist = map{ split(/\//, $_, 2) } # make list
    grep{ m/\// } split(/,/, $match_list);

    $attrs =~ s/\A$tag //; # clean up
    my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g; # make arra
+y for xlist
    for (@attr) {
     my ($name, $value) = split(/=/,$_);
     # only one attribute of the same name is allowed.
     if (exists $xlist{$name} && defined $xlist{$name}) {
      $xlist{$name} = undef; # this exists now
      $markup =~ s/X\{$name\}/$value/g;
     }
      else { # fails get nothing and end.
       $markup = '';
       last;
       }
     }
     # not in Benchmark
#     $markup
#     ? print $markup
#     : print 'nope';
}
     timethese($ARGV[0] || -1, {
    1 ?  (
        'a_first_draft'  => \&first_draft,
    ) : (),
    1 ?  (
        'b_second_draft'  => \&second_draft,
    ) : (),
    1 ?  (
        'c_third_draft'  => \&third_draft,
    ) : (),
});
[download]

Thank you in advance.

Update: Added better comments in code.

Comment on Speeding up a Perl code. Can it be faster? Download Code

Replies are listed 'Best First'.

Re: Speeding up a Perl code. Can it be faster?
by Athanasius (Archbishop) on Jun 10, 2016 at 12:49 UTC

Hello $h4X4_|=73}{,

These lines look strange to me:

if (exists $xlist{$name} && defined $xlist{$name}) {
 $xlist{$name} = undef; # this exists now
[download]

However, if the logic is correct, you can gain a little speedup by removing the first test:

if (defined $xlist{$name}) {
[download]

Testing for definedness on a non-existent hash entry doesn’t raise a warning, and won’t trigger autovivification, as the following demonstrates:

22:44 >perl -Mstrict -MData::Dump -wE "my %h = (a => 1, b => undef); d
+d \%h; say $_, defined $h{$_} ? ': yes' : ': no' for qw(a b c); dd \%
+h;"
{ a => 1, b => undef }
a: yes
b: no
c: no
{ a => 1, b => undef }

22:47 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: Speeding up a Perl code. Can it be faster?
by choroba (Cardinal) on Jun 10, 2016 at 13:07 UTC

#!/usr/bin/perl
use strict;
use warnings;

use Data::Dumper;
use Benchmark qw{ cmpthese };

my $match_list = 'a/ere,c/ere';
my $markup = 'TY:X{a} X{c}';
my $attrs = 'TY a=errt c=rrrdd';
my $tag = 'TY';

sub first_draft {
    my %xlist = map{ split(/\//, $_, 2) }
        grep{ m/\// } split(/,/, $match_list);

    $attrs =~ s/\A$tag //;
    my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g;
    for (@attr) {
        my ($name, $value) = split(/=/,$_);
        if (exists $xlist{$name} && defined $xlist{$name}) {
            $xlist{$name} = undef;
            $markup =~ s/X\{$name\}/$value/g;
        } else {
            $markup = '';
            last;
        }
    }
    $markup
}


sub choroba {
    my %xlist = map split(m=/=, $_, 2), grep m=/=, split /,/, $match_l
+ist;
    $attrs =~ s/\A$tag //;
    my @attr = $attrs =~ /(?:\A| )(.+?)(?=(?: \w+=|\z))/g;
    my %replace = map split /=/, @attr;
    $markup =~ s/X\{(.*)\}/$replace{$1}/g;
    $markup
}


use Test::More;
is first_draft(), choroba(), 'same';
done_testing(1);

warn Dumper(first_draft());

cmpthese(shift || -1, {
    'first_draft' => \&first_draft,
    'choroba'     => \&choroba,
});
[download]

Read more... (401 Bytes)

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
[download]

[reply]
[d/l]
[select]

Re: Speeding up a Perl code. Can it be faster?
by hippo (Bishop) on Jun 10, 2016 at 12:56 UTC

This runs about 20% faster for me, YMMV:

use strict;
use warnings;
use Benchmark;
my $match_list = 'a/ere,c/ere'; # verification pattern
my $markup = 'TY:X{a} X{c}'; # output markup
my $attrs = 'TY a=errt c=rrrdd'; # info to check
my $tag = 'TY'; # tag name

sub first_draft {
    my %xlist = map{ split('/', $_, 2) } # make list
    grep{ m'/' } split(',', $match_list);

    $attrs =~ s/\A$tag //; # clean up
    for (split (' ', $attrs)) {
     my ($name, $value) = split('=');
     # only one attribute of the same name is allowed.
     if (defined $xlist{$name}) {
      $xlist{$name} = undef; # this exists now
      $markup =~ s/X{$name}/$value/g;
     }
      else { # fails get nothing and end.
       $markup = '';
       last;
       }
     }
     # not in Benchmark
#     $markup
#     ? print $markup
#     : print 'nope';
}
     timethese($ARGV[0] || -1, {
    1 ?  (
        'first_draft'  => \&first_draft,
    ) : (),
});
[download]

I would expect there to be a more efficient way to set up %xlist but it hasn't occurred to me yet precisely how.

[reply]
[d/l]
[select]

Re: Speeding up a Perl code. Can it be faster?
by davido (Cardinal) on Jun 10, 2016 at 14:38 UTC

Install the Perl module from CPAN, Devel::NYTProf, write a test script that invokes this subroutine, and then profile it. I would not suggest using NYTProf on code that also uses Benchmark -- just write a script that invokes the subroutine a few times with varying but realistic data and run it through the profiler.

By doing this you'll gain a better appreciation for where the hotspots are. Without doing this, we really have to just use intuition, which is not as reliable as real data.

Dave

[reply]

Re: Speeding up a Perl code. Can it be faster?
by stevieb (Canon) on Jun 10, 2016 at 12:45 UTC

It would probably prove useful if you added some description of what your code does, and in what context you use it. I know it can be gleaned from studying the code, but if readers had this info up-front, it makes digesting the code easier.

[reply]

Re: Speeding up a Perl code. Can it be faster?
by oiskuu (Hermit) on Jun 10, 2016 at 15:42 UTC

It'd be helpful to know more of the specifics. For example, if the keys are limited to alphanumeric labels, the simplification could provide further speedup. Anyway, here's one regex-convoluted aka "adding problems to problems" version for your perusal:

sub uhuu {
    my $keys = join "|", map "\Q$_\E", ($match_list =~ m{([^/,]+)/[^,]
+*}g);

    my %R;
    return unless $attrs =~ m{
        \A \Q$tag\E \s+
        (?: ($keys)=(\S+) (?(?{ defined $R{$1} || ($R{$1}=$2, 0) })(?!
+)) \s* )*+
        \z }x;

    $markup =~ s[X{($keys)}][ $R{$1} // $& ]ger
}
[download]

A side-note to benchmarkers: localize the globals that you modify. ("On the importance of testing your testing.")

[reply]
[d/l]

Re: Speeding up a Perl code. Can it be faster?
by $h4X4_|=73}{ (Monk) on Jun 11, 2016 at 10:45 UTC

Thank you all for help and showing me more ways to do the same thing faster.
From tweaking this small code I managed to speed it up about three times faster, just moving some stuff around.

sub forth_draft {
    my %xlist = map{ split('\/', $_) } split(',', $match_list);

    $attrs =~ s/\A$tag //; # clean up
    for (split (' (?=\w+\=)', $attrs)) {
     my ($name, $value) = split('=');
     # little more golfing action
     (defined $xlist{$name} && $markup =~ s/X\{$name\}/$value/g)
      ? $xlist{$name} = undef # this exists now
      : $markup = '' if $markup; # do this and let it run through, fas
+ter.
      last if !$markup; # with out this it drops speed now?
     }

     # not in Benchmark
#     $markup
#     ? print $markup
#     : print 'nope';
}
[download]

Benchmark: running a_first_draft, b_second_draft, c_third_draft, d_for
+th_draft for at least 1 CPU seconds...
a_first_draft:  1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @
+ 29132.76/s (n=30502)
b_second_draft:  1 wallclock secs ( 1.09 usr +  0.00 sys =  1.09 CPU) 
+@ 31449.73/s (n=34406)
c_third_draft:  1 wallclock secs ( 1.11 usr +  0.00 sys =  1.11 CPU) @
+ 35255.18/s (n=39098)
d_forth_draft:  1 wallclock secs ( 1.06 usr +  0.00 sys =  1.06 CPU) @
+ 115597.37/s (n=122880)
[download]

[reply]
[d/l]
[select]

Re^2: Speeding up a Perl code. Can it be faster? (-50%)

by BrowserUk (Patriarch) on Jun 11, 2016 at 12:27 UTC

This is 6x faster than your original and 50% faster than your latest (and a whole lot more readable to boot):

use strict;
use warnings;
use Data::Dump qw[ pp ];
use Benchmark qw[ cmpthese ];

my $match_list  = 'a/ere,b/ere,c/ere,d/ere,e/ere,f/ere,g/ere,h/ere';
my @markup      = ( 'TY:' .  'X{a} X{b} X{c} X{d} X{e}  X{f} X{g} X{h}
+ ' x 100 ) x 3;
my @attrs       = ( 'TY a=alpha b=bravo c=charlie d=delta e=echo f=fox
+trot g=golf h=hotel' ) x 3;
my $tag         = 'TY';

my $iters = $ARGV[ 0 ] // -1;

sub buk {
    my %xlist = map split( m'/', $_ ), split ',', $match_list;

    $attrs[ 0 ] =~ s[^$tag ][];

    for( split m' ', $attrs[ 0 ] ) {
        my( $name, $value ) = split '=';

        ( defined $xlist{ $name } && $markup[ 0 ] =~ s[X\{$name\}][$va
+lue]g ) 
            or $markup[ 0 ] = '', last;

        undef $xlist{ $name };
    }
    $iters ==  1 and print "0:$markup[ 0 ]\n"
}



sub first_draft {
    my %xlist = map{ split m[/], $_, 2  } grep{ m[/] } split ',', $mat
+ch_list;

    $attrs[ 1 ] =~ s[\A$tag ][]; # clean up

    my @attr = $attrs[ 1 ] =~ m[(?:\A| )(.+?)(?=(?: \w+=|\z))]g;
    for( @attr ) {
        my( $name, $value ) = split '=', $_;
        # only one attribute of the same name is allowed.
        if( exists $xlist{ $name } && defined $xlist{ $name } ) {
            $xlist{ $name } = undef; # this exists now
            $markup[ 1 ] =~ s[X\{$name\}][$value]g;
        }
        else { # fails get nothing and end.
            $markup[ 1 ] = '';
            last;
        }
    }
    $iters ==  1 and print "1:$markup[ 1 ]\n"
}

sub forth_draft {
    my %xlist = map{ split('\/', $_) } split(',', $match_list);

    $attrs[ 2 ] =~ s/\A$tag //; # clean up

    for (split (' (?=\w+\=)', $attrs[ 2 ])) {
     my ($name, $value) = split('=');
     # little more golfing action
     (defined $xlist{$name} && $markup[ 2 ] =~ s/X\{$name\}/$value/g)
      ? $xlist{$name} = undef # this exists now
      : $markup[ 2 ] = '' if $markup[ 2 ]; # do this and let it run th
+rough, faster.
      last if !$markup[ 2 ]; # with out this it drops speed now?
     }
    $iters ==  1 and print "2:$markup[ 2 ]\n"
}

cmpthese(
    $iters, {
        'first_draft'  => \&first_draft,
        'forth_draft'  => \&forth_draft,
        buk => \&buk,
    }
);

__END__
C:\test>junk -1
               Rate first_draft forth_draft         buk
first_draft  4640/s          --        -79%        -86%
forth_draft 22226/s        379%          --        -33%
buk         33339/s        619%         50%          --
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)

In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.