Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

I considered the use of study() in a project at work, but was unable to find a sufficient increase in efficiency to use it (as the application I considered it for was a CGI searching for a limited amount of information). I did, however, test the use of study() again out of curiousity after reading the replies by clintp and LunaticLeo .

My testing consisted of performing a search for the word "lease" in a large file (a sample taken from a DHCP server's leases file, consisting of 787'811 lines / 22'741'219 characters, the word occurring 69'474 times) using the code below. I wrote the results from the program to STDERR (to be able to filter them later), and tested 3 possibilities:

  1. without the use of study()
  2. using study() before the loop, and
  3. using study() within the loop, similar to the way used in the 2nd edition of the Camel book.
My results (executed using 'perl test.pl 2>/dev/null') were as follows:
Benchmark: timing 100 iterations of w/o study, with study, with study +in loop... w/o study: 346 wallclock secs (306.95 usr + 8.69 sys = 315.64 CPU) with study: 317 wallclock secs (301.91 usr + 8.53 sys = 310.44 CPU) with study in loop: 369 wallclock secs (347.37 usr + 8.39 sys = 355.7 +6 CPU)
#!/usr/local/bin/perl -w -- use Benchmark qw(timethese clearallcache); $FILENAME = "datafile.txt"; $TEXT = "lease"; $STUDIED_TEXT = $TEXT; study($STUDIED_TEXT); $COUNT = 100; clearallcache; &timethese($COUNT, { 'with study' => \&fn1, 'w/o study' => \&fn2, 'with study in loop' => \&fn3 } ); sub fn1 { &mystat($FILENAME); print(STDERR "Searching for $STUDIED_TEXT\t"); open(DF, $FILENAME); my $count = 0; while ($line = <DF>) { $count++ if ($line =~ m/$STUDIED_TEXT/); } print(STDERR "fn1 : Lines found : $count\n"); close(DF); } sub fn2 { &mystat($FILENAME); print(STDERR "Searching for $TEXT\t"); open(DF, $FILENAME); my $count = 0; while ($line = <DF>) { $count++ if ($line =~ m/$TEXT/); } print(STDERR "fn2 : Lines found : $count\n"); close(DF); } sub fn3 { &mystat($FILENAME); print(STDERR "Searching for $TEXT\t"); open(DF, $FILENAME); my $count = 0; while ($line = <DF>) { study($TEXT); $count++ if ($line =~ m/$TEXT/); } print(STDERR "fn3 : Lines found : $count\n"); close(DF); } sub mystat { local($filename) = @_; print(STDERR "Filename : $filename\tSize : ", (stat($filename))[7], "\t"); }

My results, however, might differ from that of others, had I had a search string with some characters more rare than others, and am still learning to Benchmark effectively. The moral to this (I believe) is that if you think it might prove helpful, Benchmark it and see, and remember, as always, YMMV.

Update: I stand corrected by the experience and knowledge of chipmunk . Thank you chipmunk , for the correction to my understanding (or lack thereof).

Update: After considering chipmunk's correction, I have edited and retested code to try to determine the effect of the study() statement. The new code is below, but I have left the code above as text for those who may learn from the correction, as I have. I utilized the same datafile as before. The new tests were:

  1. without use of study() or /o (on regex)
  2. without use of study() but with /o
  3. with study() without /o, and
  4. with study() and /o.
The results were as follows:
Benchmark: timing 100 iterations of w/o study or /o, w/o study with /o +, with study and /o, with study w/o /o... w/o study or /o: 352 wallclock secs (304.41 usr + 8.55 sys = 312.96 C +PU) w/o study with /o: 388 wallclock secs (253.90 usr + 8.33 sys = 262.23 + CPU) with study and /o: 881 wallclock secs (507.50 usr + 8.17 sys = 515.67 + CPU) with study w/o /o: 823 wallclock secs (597.40 usr + 8.31 sys = 605.71 + CPU)
#!/usr/local/bin/perl -w -- use Benchmark qw(timethese clearallcache); $FILENAME = "datafile.txt"; $TEXT = "lease"; $COUNT = 100; clearallcache; &timethese($COUNT, { 'w/o study or /o' => \&fn1, 'w/o study with /o' => \&fn2, 'with study w/o /o' => \&fn3, 'with study and /o' => \&fn4 } ); sub fn1 { &mystat($FILENAME); print(STDERR "Searching for $TEXT\t"); open(DF, $FILENAME); my $count = 0; while ($line = <DF>) { $count++ if ($line =~ m/$TEXT/); } print(STDERR "fn1 : Lines found : $count\n"); close(DF); } sub fn2 { &mystat($FILENAME); print(STDERR "Searching for $TEXT\t"); open(DF, $FILENAME); my $count = 0; while ($line = <DF>) { $count++ if ($line =~ m/$TEXT/o); } print(STDERR "fn2 : Lines found : $count\n"); close(DF); } sub fn3 { &mystat($FILENAME); print(STDERR "Searching for $TEXT\t"); open(DF, $FILENAME); my $count = 0; while ($line = <DF>) { study($line); $count++ if ($line =~ m/$TEXT/); } print(STDERR "fn3 : Lines found : $count\n"); close(DF); } sub fn4 { &mystat($FILENAME); print(STDERR "Searching for $TEXT\t"); open(DF, $FILENAME); my $count = 0; while ($line = <DF>) { study($line); $count++ if ($line =~ m/$TEXT/o); } print(STDERR "fn4 : Lines found : $count\n"); close(DF); } sub mystat { local($filename) = @_; print(STDERR "Filename : $filename\tSize : ", (stat($filename))[7], "\t"); }

Question: what effect could the caching in the Benchmark.pm module have on this code/results?


In reply to Re: Why study SCALAR? by atcroft
in thread Why study SCALAR? by mrbbking

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (5)
    As of 2020-12-03 16:54 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      How often do you use taint mode?





      Results (57 votes). Check out past polls.

      Notices?