Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: regex logical equivalence?

by jarich (Curate)
on Feb 05, 2004 at 06:14 UTC ( [id://326708]=note: print w/replies, xml ) Need Help??


in reply to regex logical equivalence?

I think you might find it worthwhile to learn about /x at this point as these regular expressions could certainly do with some commenting. /x isn't hard or scary at all. All you have to do is rememeber to escape the whitespace you want and the #s. It makes regular expressions much easier to explain.

Just to make things more confusing ;) I'm going to swap the order of these two expressions, so my first one will be the longer of the two and I'll work on the the shorter (my second - your first) as I think that's the one you wanted to focus on.

To determine if your two regular expressions are suffiently equivalent we need to compare them.

This is the longer one:

/.* # Stuff ( # START capturing to $1 [$\ \#\%>~] # Any single space, $, #, %, > or ~ | # OR \[* # 0 or more [s \w* # 0 or more word characters (a-zA-Z0-9_) \@* # 0 or more @s \-* # 0 or more -s \w* # more word characters \% # Exactly 1 % \]* # 0 or more ]s | # OR \[*\w*\@*\-*\w*\#\]* # As above, but with a # instead of % | # OR \[*\w*\@*\-*\w*\$\]* # As above, but with a $ | # OR \[*\w*\@*\-­*\w*>\]* # As above, but with no terminator # (will therefore match any terminator) | # OR \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m ) # END of $1 \s? # 0 or 1 spaces /x
and this the shorter:
/.* # Stuff ( # START capturing to $1 \[* # 0 or more [s \w* # 0 or more word characters \@* # 0 or more @s \-* # 0 or more -s \w* # more word characters [$\ \#\%>~] # exactly 1 space, $, #, %, > or ~ \] # exactly 1 ] (are you missing a * ?) | # OR \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m ) # END of $1 \s? # 0 or 1 spaces /x
Now we need to consider what patterns will match one, but not the other... (I'm going to assume you are missing a * up there next to your ], if not, then these aren't very equivalent at all).
  • Any 1 space, $, #, %, > or ~ will be matched by both.
  • The escape sequence: \[\e[0m\\] [0m is allowed by both.
  • Each pattern: [w@-w$], [w@-w#], [w@-w%], [w@-w~] is allowed by both.
  • [w@-w ] is (as you shown) is allowed by the second but not the first (this is easy to fix)

Like you, I can only spot this one significant difference between the two regular expressions (once you fix your typo).

This is easily fixed:
/.* # Stuff ( # START capturing to $1 \ # exactly 1 space | # OR \[* # 0 or more [s \w* # 0 or more word characters \@* # 0 or more @s \-* # 0 or more -s \w* # more word characters [$\#\%>~] # exactly 1 of $, #, %, > or ~ \]* # 0 or more ]s | # OR \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m ) # END of $1 \s? # 0 or 1 spaces /x
Note that this equivalence won't necessarily remain true if you change your quantifiers. In particular if you change all of your *s to ?s. If you want my opinion I suspect you're actually looking more for a regular expression like this:
/.* # Stuff ( # START capturing to $1 \ # exactly 1 space | # OR \[? # 0 or 1 [ \w* # 0 or more word characters \@? # 0 or 1 @ [-\w.]* # 0 or more word chars, dots and hyphens eg +w-w.w-.w [$\#\%>~] # exactly 1 of $, #, %, > or ~ \]? # 0 or 1 ] | # OR \\\[\\e\[0m\\\]\ \[0m # the sequence: \[\e[0m\\] [0m ) # END of $1 \s? # 0 or 1 spaces /x

But I may be wrong - you may not be interested in the dot at all. ;) I'm not 100% certain that you want the .* at the front though. Do you have some sample data for us?

I hope you recognise that both expressions will match any string with a single space in it... which will be most strings....

I hope this helps.

jarich

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://326708]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2025-06-12 15:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.