Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Re: Regular Expression Question

by nevyn (Monk)
on Dec 04, 2003 at 20:48 UTC ( [id://312312]=note: print w/replies, xml ) Need Help??


in reply to Re: Regular Expression Question
in thread Regular Expression Question

you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".

So for a working single regexp you want (assuming \w is good enough)...

/^\w+ # Starts with a "word" (?:,\w+)*$/x; # Followed by many "comma and word" atoms

Which IMO is ugglier than...

/^[\w,]+$]/ && # Comma word atoms ! /^,|,,|,$/; # With constraints on commas

It's probably faster to use 2 regexps too

--
James Antill

Replies are listed 'Best First'.
Re: Re: Re: Regular Expression Question
by simonm (Vicar) on Dec 04, 2003 at 20:59 UTC
    you should test your regexp on ",foo," which will work, and shouldn't, as will "!!foo!!,".

    It depends on what you think "should work". The OP's original regex was not anchored, and seemed intended to extract matching substrings rather than confirm that an entire string matched.

    The /(\w+(?:\,\w+)*)/ regex will successfully extract the matching "foo" substring from your two sample cases into $1. If you want to check the entire string, then yes, leave out the parenthesis and use ^...$ anchors.

    Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

    However, if you want to extract matching substrings, I think the single regex is a sensible approach.

      Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)

      Generally anything that looks like "(AB*)*" is bad for the backtracking.

      --
      James Antill
      Update: with regard to It's probably faster to use 2 regexps too: Yes, a quick Benchmarking shows that, with anchoring, the double-regex style runs about 50% faster than the single-regex solution I posted. (Perhaps one of the resident RegEx gurus can explain why this is?)
      I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion. The benchmark below shows the one regex solution to be somewhat faster - the data sample is tiny though.
      #!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; chomp (our @lines = <DATA>); our (@r1, @r2); cmpthese -10 => { one => '@r1 = map {/^\w+(?:,\w+)*$/ ? 1 : 0} @lines +', two => '@r2 = map {/^[\w,]+$/ && !/^,|,,|,$/ ? 1 : 0} @lines +', }; die "Unequal" unless "@r1" eq "@r2"; __DATA__ one,two,three,four,five ,one,two,three,four,five one,two,three,four,five, one,two,three,,four,five one,two,three four,five Rate two one two 25436/s -- -26% one 34417/s 35% --

      Abigail

        I'd be interested to see your benchmark (code + data), as I don't come to the same conclusion.

        Test and output attached below. Looks like it is dependent on your data set...

        use strict; use Benchmark 'cmpthese'; my @data = <DATA>; my @long = map { join '', $_ x 100 } @data; my %cases = ( 'Single' => sub { for ( @long ) { /^\w+(?:,\w+)*$/ } }, 'Double' => sub { for ( @long ) { /^[\w,]+$/ && ! /^,|,,|,$/ } }, ); cmpthese( 0, \%cases); __DATA__ !@#$as3dfa ,sdfas3df, asd3fsa,,a3sdf as3df,asdf3,3asdf,asd3f sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf 3sad3fasdjasdfkasdfklas3jf3
                  Rate Single Double
        Single  6158/s     --   -83%
        Double 35319/s   474%     --

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://312312]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-03-19 06:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found