Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: Sort/Uniq Help

by poolpi (Hermit)
on Mar 18, 2008 at 09:48 UTC ( [id://674757]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Sort/Uniq Help
in thread Sort/Uniq Help

You can simplify your long regexp with :

use Regexp::Common qw /net/; /\A $RE{net}{IPv4} | password | (ssn=) \z/xmi;

hth,

PooLpi

'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

Update: Oops,if it's a succession of alternatives : moritz++ ;)
I also forgot the /m

Replies are listed 'Best First'.
Re^4: Sort/Uniq Help
by moritz (Cardinal) on Mar 18, 2008 at 10:00 UTC
    I know that TheDamian recommends character classes to escape chars in regexes (in PBP), but it's generally a bad idea because it will disable some optimizations (at least in older versions of perl, don't know about current ones).

    Also \| is shorten than [|], and thus less noise that your brain has to parse.

    But in the original post the | isn't escaped at all, so you're actually modifiying the behaviour of the regex.

      By curiosity :

      This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi

      #!/usr/bin/perl use strict; use warnings; use Regexp::Common qw /net/; use Benchmark qw( cmpthese ); my $line = q{127.0.0.1}; cmpthese -10, { RE => '$line =~ /\A $RE{net}{IPv4} [|] password [|] (ssn=) \z/xmi' +, RE_O => '$line =~ /\A $RE{net}{IPv4} [|] password [|] (ssn=) \z/xm +io', ORIG => '$line =~ /[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\|pas +sword\|(ssn=)/i', RE_CHAR => 'use charnames qw( :full); $line =~ /\A $RE{net}{IPv4} \N{LINE TABULATION} password \N{LINE TABULATION} (ssn=) \z/xmi' };
      Rate RE_CHAR RE RE_O ORIG RE_CHAR 17366/s -- -2% -2% -100% RE 17704/s 2% -- -0% -100% RE_O 17747/s 2% 0% -- -100% ORIG 12717477/s 73132% 71732% 71561% --

      PooLpi

      'Ebry haffa hoe hab im tik a bush'. Jamaican proverb
        This example shows that the main speed difference is reall Regexp::Common, which does a bit more than just match \d{1,3}\.\d{1,3}\. ..:
        $ perl -MRegexp::Common=net -wle 'print $RE{net}{IPv4}' (?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[ +0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0 +-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))
        The optimzation I talked about kicks in when the string is much longer, and the literal char occurs only once or twice. Then the literal is used as an anchor, thus reducing the need for backtracking.
        #!/usr/bin/perl use strict; use warnings; my $line = ('a' x 500) . 'b!' . ('a' x 20); use Benchmark qw( cmpthese ); cmpthese -3, { literal => sub {$line =~ /a.{1,10}b!/ }, class => sub {$line =~ /a.{1,10}[b][!]/}, }; __END__ Rate class literal class 3855/s -- -99% literal 712766/s 18390% --

        Update: added benchmark

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://674757]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-19 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found