Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Benchmark results | localizing $INPUT_RECORD_SEPARATOR vs spliting contents of file on $INPUT_RECORD_SEPARATOR

by ashish.kvarma (Monk)
on Nov 06, 2012 at 12:16 UTC ( [id://1002465]=perlquestion: print w/replies, xml ) Need Help??

ashish.kvarma has asked for the wisdom of the Perl Monks concerning the following question:

I was looking and arunshankar.c's post XML parsing and thought same could be done (slightly faster) using localized $INPUT_RECORD_SEPARATOR.
I assumed (don't know why) that it will faster using $INPUT_RECORD_SEPARATOR. To see how much faster it is I did a small benchmark, but was astonished at the results.

Below are the results and the benchmark code I used.

#!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese); my $count = -100; cmpthese($count, { 'Split' => sub { my $document; open(FILE, 'removed.xml') or die "Error [$!]\n"; while (<FILE>) { $document .= $_ } my @lines = split('\|',$document); }, 'IRS_while' => sub { local $/ = '|'; my @lines; open(FILE, 'removed.xml') or die "Error [$!]\n"; while (<FILE>) { chomp; push @lines, $_; } }, 'IRS_map' => sub { local $/ = '|'; open(FILE, 'removed.xml') or die "Error [$!]\n"; my @lines = map {chomp; $_} (<FILE>); }, });
Rate IRS_map IRS_while Split IRS_map 4936/s -- -7% -8% IRS_while 5303/s 7% -- -2% Split 5394/s 9% 2% --

I have run this multiple times with different values of $count, Spiting string seems to have slight advantage in all cases. For a while I though I may be doing something wrong in the code, though at least I am not able to see if there is any issue with the code. I guess splitting is a bit faster (probably not significant, but it is what it is).
Can someone please help me to understand why is Split faster than using $INPUT_RECORD_SEPARATOR.
Thanks in advance.

P.S: Don't know if its important but just for information, I am using Active Perl 5.16 on Windows 7, 32 bit, Intel Core i3.

Regards,
Ashish

Replies are listed 'Best First'.
Re: Benchmark results | localizing $INPUT_RECORD_SEPARATOR vs spliting contents of file on $INPUT_RECORD_SEPARATOR
by MidLifeXis (Monsignor) on Nov 06, 2012 at 13:32 UTC

    Your IRS_while(), does not return the same information as the other two subroutines. Push returns the size of the array after update. Assignment returns the assigned value. Is it possible that you are getting bogus results due to that?

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese); my $count = -100; cmpthese($count, { 'Split' => sub { my $document; open(FILE, 'removed.xml') or die "Error [$!]\n"; while (<FILE>) { $document .= $_ } my @lines = split('\|',$document); return @lines; }, 'IRS_while' => sub { local $/ = '|'; my @lines; open(FILE, 'removed.xml') or die "Error [$!]\n"; while (<FILE>) { chomp; push @lines, $_; } return @lines; }, 'IRS_map' => sub { local $/ = '|'; open(FILE, 'removed.xml') or die "Error [$!]\n"; my @lines = map {chomp; $_} (<FILE>); return @lines; }, });

    Results:

    Rate IRS_map Split IRS_while IRS_map 9804/s -- -24% -27% Split 12876/s 31% -- -4% IRS_while 13399/s 37% 4% --
    and
    This is perl 5, version 12, subversion 3 (v5.12.3) built for MSWin32-x +86-multi-thread

    Benchmark newbie, so I am certain that I will be corrected if I have also misapplied the tool :-)

    Update: To tie into a comment by 2teez, my versions of this seem to be spending a large amount of time in open (32% of total time) and readline (18% of total time) -- a whopping 50% of the total time, with those two items being the top two runners when sorted by exclusive time under Devel::NYTProf.

    --MidLifeXis

      -100 is too long
      #!/usr/bin/perl ## spin-up hard-disk, init cache ;) Split(); IRS_while(); IRS_map();

      irate/tyerate

      Rate IRS_map Split IRS_while IRS_map 7682/s -- 0.83 0.81 Split 9284/s 1.21 -- 0.97 IRS_while 9529/s 1.24 1.03 --

      And from memory

      my $removed = \ scalar read_file('removed.xml'); use File::Slurp; Rate IRS_map IRS_while Split IRS_map 16544/s -- -36% -49% IRS_while 25775/s 56% -- -21% Split 32686/s 98% 27% --

      irate/tyerate

      Rate IRS_map IRS_while Split IRS_map 16544/s -- 0.64 0.51 IRS_while 25775/s 1.56 -- 0.79 Split 32686/s 1.98 1.27 --

        Ok, I can agree with that - I don't see how that changes the results. Changed to -3.

        Rate IRS_map Split IRS_while IRS_map 9659/s -- -26% -28% Split 13017/s 35% -- -3% IRS_while 13379/s 39% 3% --

        Considering that a lot of time is in I/O (open and readline), I wonder if the differences between our results are due more to I/O characteristics on our systems than to the algorithm itself.

        --MidLifeXis

Re: Benchmark results | localizing $INPUT_RECORD_SEPARATOR vs spliting contents of file on $INPUT_RECORD_SEPARATOR
by 2teez (Vicar) on Nov 06, 2012 at 12:42 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1002465]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-03-19 06:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found