Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

SUBSTR OR REGEX: WHICH WILL YOU FAVOUR?

by Anonymous Monk
on Oct 19, 2013 at 19:38 UTC ( #1058941=perlmeditation: print w/ replies, xml ) Need Help??

Recently, I wrote a script for an individual, which is to take three consecutive letter or number from a line of number or string, until the whole line is "taken care of"{ I hope you get that.}. Of course, I rolled out my while loop, with a regex to "pick out" these three consecutive letters like:

... $str .= $1 . $/ while ( $ARGV[0] =~ m/(.{3})/g ); ...
Then, I was told NO, please can you use substr function? Hummm! Ok I will. Though that doesn't come to me naturally. I felt like, why become verbose when you can be simple and "efficient"? So, I did.
... use constant LEN => 3; ... my $offset = 0; my $length = LEN; while ( $_ = substr $ARGV[0], $offset, $length ) { last if length != LEN; $str .= $_ . $/; $offset += $length; } ...
Of course, it worked, and the input is coming directly from the CLI - Command Line Interface. However, I decided on my own to Benchmark these. Though, I know that substr should be faster than regex usage. So, I called up the benchmark module and did this:
#!usr/bin/perl -w use strict; use constant LEN => 3; use Benchmark qw(:all); my $str; my $count = -2; my $re = timethese( $count, { substring => sub { my $offset = 0; my $length = LEN; while ( $_ = substr $ARGV[0], $offset, $length ) { last if length != LEN; $str .= $_ . $/; $offset += $length; } }, regex => sub { $str .= $1 . $/ while ( $ARGV[0] =~ m/(.{3})/g ); } } ); cmpthese($re);
And my result confirmed by pre-informed mind. With this:
Benchmark: running regex, substring for at least 2 CPU seconds... regex: 1 wallclock secs ( 2.09 usr + 0.00 sys = 2.09 CPU) @ 14 +030.14/s (n=29323) substring: 3 wallclock secs ( 2.14 usr + 0.00 sys = 2.14 CPU) @ 23 +225.70/s (n=49703) Rate regex substring regex 14030/s -- -40% substring 23226/s 66% --
Oh! Yea! So, I thought. But somehow, I check the script and and changed this
... $str .= $1 . $/ while ( $ARGV[0] =~ m/(.{3})/g ); ## NOTE the +number 3 ...
to this
... $str .= $1 . $/ while ( $ARGV[0] =~ m/(.{LEN})/g ); ## NOTE 3 +is now LEN ...
Since the constant LEN is 3, then I got this surprise:
Benchmark: running regex, substring for at least 2 CPU seconds... regex: 4 wallclock secs ( 2.72 usr + 0.01 sys = 2.73 CPU) @ 74 +0317.95/s (n=2021068) substring: 2 wallclock secs ( 2.66 usr + 0.02 sys = 2.68 CPU) @ 17 +332.84/s (n=46452) Rate substring regex substring 17333/s -- -98% regex 740318/s 4171% --
Common on?! Is the benchmark module broke or something? lol!!. Or I didn't write the benchmark script well? Anyway like am back in love with simple and precise over verbose. What do you think?
~ zadok_the_priest ~

Comment on SUBSTR OR REGEX: WHICH WILL YOU FAVOUR?
Select or Download Code
Re: SUBSTR OR REGEX: WHICH WILL YOU FAVOUR?
by BrowserUk (Pope) on Oct 19, 2013 at 20:16 UTC
    Is the benchmark module broke or something? lol!!

    Yes. lol!!

    print $1 while 'abcdefghijkl' =~ m[(.{3})]g;; abc def ghi jkl use constant LEN => 3;; print $1 while 'abcdefghijkl' =~ m[(.{LEN})]g;; ## notable absence of output mean the regex matched nothing (very, ver +y quickly).

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Nice catch BrowserUk! You know I didn't even border to test the changed made on the regex to see the output.
      I think this is why is good to be paranoid some of the time, especially, when your pre-informed mind is called to question by new discovery! lol!!
      So, substr still carry the day!!!

Re: SUBSTR OR REGEX: WHICH WILL YOU FAVOUR?
by LanX (Canon) on Oct 19, 2013 at 20:18 UTC
    > SUBSTR OR REGEX

    neither will work capitalized...

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      neither will work capitalized...
      Obviously, neither is there "regex" as a function in perl!
      I believe you should know better :(. The capitalized was used just for the title.

        YOU SHOULD TRY M// OR S/// ! ;-)

        > The capitalized was used just for the title.

        Which is considered yelling...

        Or don't you know better?

        Cheers Rolf

        ( addicted to the Perl Programming Language)

Re: SUBSTR OR REGEX: WHICH WILL YOU FAVOUR?
by bulk88 (Priest) on Oct 20, 2013 at 01:06 UTC
    substr is always faster than a regex. index is always faster than a regex. Combining the 2, they are faster than a regex. But if you add an if else block or more than 1 index, they start being the same speed as a regex. Regex parsing logic is faster than Perl optree logic. But a regex can never beat memcpy (AKA substr) and strstr (AKA index).
      Recently I found simple regexp $x =~ /Something/ is just 5%-10% slower than index (on small inputs)

      I actually can't see reason why perl won't recognize /Something/ as something that can be replaced by index() at compile time

      And 10% is just implementation overhead.
Re: SUBSTR OR REGEX: WHICH WILL YOU FAVOUR?
by moritz (Cardinal) on Oct 20, 2013 at 05:43 UTC
Re: SUBSTR OR REGEX: WHICH WILL YOU FAVOUR?
by vsespb (Hermit) on Oct 26, 2013 at 22:16 UTC
    Then, I was told NO, please can you use substr function?
    I think that the person who told this was wrong

    I don't think it's a case when performance matters. You are working with $ARGV[0], which means it's something done once in the beginning of program

    I assume maintainbility and readability is more important here, than tiny performance improvements.
    And Perl program written in "C" style (with substr() and index() everywhere) isn't considered readable
      And Perl program written in "C" style (with substr() and index() everywhere) isn't considered readable

      Do you perchance mean "isn't considered readable" by YOU?

      Authoritative statements of fact, without citing the source of authority, are like those claims that mobile phones would fry our brain cells.

      It is also a really strange claim. I mean, decently formatted C code is perfectly readable.

      So, decently formatted Perl code written in the C-style can be equally readable.

      It may not be idiomatic; or as concise; or as efficient; but there is no reason it cannot be readable. And if you are unfamiliar with Perl idioms; it is probably far more readable to you than idiomatic Perl.

      BTW: I don't disagree that artificially rejecting the use of regex is a silly restriction -- unless it is done for a reason. Perhaps the idea is to encourage the OP to gain an appreciation of the work that the regex engine does on our behalf.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        It may not be idiomatic; or as concise; or as efficient; but there is no reason it cannot be readable.

        It's just bigger, more lines of code. More characters in line. More scrolling needed.

        this
        if ($x && $s =~ /(abc|def)/)
        is more readable than this
        if ($x && index($s, "abc") >= 0 || index($s, "def") >= 0)
        It is also a really strange claim. I mean, decently formatted C code is perfectly readable.
        Good C code is readable. But good perl code more readable than good C code.
        Do you perchance mean "isn't considered readable" by YOU?
        Of course. Should I append "IMHO" to every my posting?

        And if you are unfamiliar with Perl idioms; it is probably far more readable to you than idiomatic Perl.
        And if you are familar only with Assembler idioms, Assembler is more readable than Perl, and even more, than C
        without citing the source of authority
        cpan grep for index

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1058941]
Approved by Athanasius
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-07-31 09:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls