Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^4: Speed Improvement

by BrowserUk (Patriarch)
on Dec 02, 2014 at 15:37 UTC ( [id://1108972]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Speed Improvement
in thread Speed Improvement

I think you may be being fooled by an error:

my $out = study_substitute($_) for each (@messages); #..................................^^^?^^^^

FWIW: I don't see any improvement (actually a 2% drop) in the performance of the study version (using 5.10/win).


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^5: Speed Improvement
by GotToBTru (Prior) on Dec 02, 2014 at 16:01 UTC

    I got the same results once I fixed the typo: yours and McA's algorithms performed slightly worse, toolic's only marginally better when study was added. So something happened, definitely not a no-op, but nothing useful either! With such short strings, I shouldn't be surprised.

    1 Peter 4:10
      yours and McA's algorithms performed slightly worse,

      studying takes time and rarely yields enough extra performance to cover the cost, unless you are searching very long strings -- eg. whole files -- repeatedly. Ie. study once; search many times.

      definitely not a no-op,

      I think if you look back at Dave_the_M's post you'll see that he said "a noop since 5.16". (I'm assuming the use 5.10; is indicative of what you are using?)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        No, I'm running 5.20.1, at least on my pc. I took out the use 5.010 and the results are pretty much the same -- something is happening, but nothing good. Yes, study takes the time to build a table to make searching more efficient, something akin to a database index. If the strings are short, or there are only a very few searches to perform, study isn't worth the effort. I should really try it on a sample problem like the one suggested in the docs.

        1 Peter 4:10
      [...] performed slightly worse, toolic's only marginally better [....]definitely not a no-op

      You seem to be unaware that Benchmark's attempts to subtract "overhead" from its calculations means that differences of about 20% or less are likely to have absolutely no meaning (especially for ridiculously short operations for which optimizing is almost always just a waste of your time, such as the operations y'all have been optimizing). You can easily get a report of a 20% performance difference after just changing the names you used so they get run in a different order or just running it enough times to get a big enough sample.

      I will be genuinely shocked if somebody produces something useful that uses this operation where the reported up-to 300% performance improvement actually leads to a noticeable change in total script run-time (where "noticeable" means "over 20%").

      The line from the original question:

      Every bit of speed I can squeeze out of this sub will help!

      just made me laugh. :)

      - tye        

        For (really hardcore) optimizing, the current bleadperl has Porting/bench.pl by dave_the_m, which uses the cachegrind tool to count the CPU cycles executed for a benchmark. It's mostly intended for benchmarking perl itself, but I guess it could well be used for microbenchmarking what approach is faster, after having arrived at a minimal op count.

        Hi tye,

        first of all: I just like these kinds of threads. Really. You see several approaches, the monks participating start to get little boys (me too) presenting the hopefully fastest approach. Many approaches are sources for new ideas or point of views. Some approaches are just beautiful (a personal taste of source code beautiness).

        There have been many discussions about whether microbenchmarks are meaningful. Yes, of cource. The context of the original question is missing to judge whether an optimization in this field matters or not.

        But we learned also: Don't trust a statistics which you haven't forged yourself. ;)

        I learned that a short time of studying before starting to work (even when this kind of studying is just making nothing) can result in an extraordinary performance boost (by just doing something different and letting people believe you did what they want). These are the lessons of and for life... :))

        Last but not least: As you said by yourself, the original post made you laugh. That's not the worst, is it?

        Regards
        McA

        I will be genuinely shocked if somebody produces something useful that uses this operation where the reported up-to 300% performance improvement actually leads to a noticeable change in total script run-time (where "noticeable" means "over 20%").

        The lines being modified are obviously some kind of templating system. For what? We can only guess, but perhaps some kind of scientific Monte Carlo simulation runs... At some point, the templated random values have to be populated with actual numbers.

        To that end, I generated 100 files averaging 500k using this code:

        Not exotic, but 'good enough'.

        I then used the following code to slurp the files in turn, do the substitutions and then write the modified data to new files in another directory. Once using Nar's OP code (tweaked to work) and once using my posted version:

        #! perl -slw use strict; use Time::HiRes qw[ time ]; sub nar_substitute { my @numeric_chars = ( 0 .. 9 ); my $message = shift; my @numeric_matches = ($message =~ m/\{\\d\d+\}/g); foreach (@numeric_matches) { my $substitution = $_; my ($substitution_count) = ($substitution =~ m/(\d+)/); my $number = ''; for (1..$substitution_count) { $number .= $numeric_chars[int rand @numeric_chars];; } $message =~ s[\Q$substitution][$number]e; } return $message; } sub buk_substitute{ my $s = shift; $s =~ s[\{\\d(\d+)\}][ substr int( 1e10 + rand 1e10 ), 1, $1 ]ge; return $s } our $O //= 0; $|++; my $start = time; for my $fname ( glob 'templ*.txt' ) { printf "\rProcessing $fname"; my $file = do{ local( @ARGV, $/ ) = $fname; <> }; $file = $O ? buk_substitute( $file ) : nar_substitute( $file ); open O, '>', "modified/$fname" or die $!; print O $file; close O; } printf "\n\nTook %.6f secs\n", time() - $start;
        [16:39:37.79] C:\test\junk>..\junk63 Processing templ99.txt Took 2064.037575 secs [17:16:13.24] C:\test\junk>del modified\* C:\test\junk\modified\*, Are you sure (Y/N)? y [17:25:20.32] C:\test\junk>..\junk63 -O=1 Processing templ99.txt Took 6.626883 secs [17:25:35.03] C:\test\junk>

        100 - ( 6.626883 / 2064.037575 * 100 ) = 99.7% saving or 311 times faster!

        Worth the effort I would say.

        Maybe this needs doing once per run of the simulation. Maybe once a day; maybe hundreds. Maybe 100 files is overkill; maybe it requires thousands of files. Maybe 500k average is oversized; maybe they a in the GB range. I don't know...and neither do you.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        just made me laugh. :)

        What made me laugh, is the way you ignored stuff you find so ludicrous, at such great length.

        Thanks for the entertainment :)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1108972]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-03-28 17:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found