Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Should I escape all chrs in a regex?

by MikeKulls (Novice)
on Sep 29, 2010 at 01:57 UTC ( [id://862533]=perlquestion: print w/replies, xml ) Need Help??

MikeKulls has asked for the wisdom of the Perl Monks concerning the following question:

Let's say I'm looking for the string "tftp>" as part of a regex. I could write /tftp>/ or I could escape the > like this /tftp\>/. Afaik these both work the same but I'm thinking the second is the better option for 2 reasons. 1) It's not necessarily possible for me to know everything about regular expressions and maybe > has some meaning in some obscure situation. Maybe I do know everyhing ;), who knows, but it is possible that someone with much greater experience in perl would learn something new even after many years. 2) Maybe it will have some meaning in the future which will break my existing code. Opinions?

Replies are listed 'Best First'.
Re: Should I escape all chrs in a regex?
by james2vegas (Chaplain) on Sep 29, 2010 at 02:08 UTC
    You can use quotemeta or \Q \E regex escapes to be sure, like this:
    my $match = quotemeta('tftp>'); if ($str =~ /$match/) { ... }
    or
    my $match = 'tftp>'; if ($str =~ /\Q$match\E/) { ... }
    or using the value directly
    if ($str =~ /\Qtftp>\E/) { ... }
Re: Should I escape all chrs in a regex?
by repellent (Priest) on Sep 29, 2010 at 03:57 UTC
    If you're (literally) searching for a string within another, you don't need to use regexp matching:
    my $sub_str = "tftp>"; my $full_str = "# tftp> connect tftp.host.com 69"; my $i = index($full_str, $sub_str); # $i is 2

    It's good to know when to use index and rindex.
      Thanks repellent, I do often forget about using index although this was just a simplified example. I was assuming whatever I am searching for is complicated enough to need a regex.
Re: Should I escape all chrs in a regex?
by chromatic (Archbishop) on Sep 29, 2010 at 03:56 UTC

    Perl 5 regex syntax isn't going to change in such a way that your code suddenly breaks. If there are ever any incompatible changes, they won't be on by default.

    (Backwards compatibility is such a constraint on Perl 5 development that a significant percentage of the Perl 5 parser—and you know how complex that parser must be—exists solely to support the Perl 1 style of invoking functions with do subname(). Has anyone ever seen that in use in Perl 5?)

      While that might be true what happens if we go to perl 6 and they break backwards compatibility? Or someone moves the regex over to some new version of PHP or puts it into C#? I know it's very unlikely to be an issue but isn't it better to be sure? Besides, it is quite possible for all I know that > does have some special meaning and I don't know about it.
        While that might be true what happens if we go to perl 6 and they break backwards compatibility?

        The perl5-to-6 migration tools will know how to compensate

        Or someone moves the regex over to some new version of PHP or puts it into C#?

        perl regular expressions are unique, moving to a different language is not a straight cut/paste operation, so that someone will have to live with the fallout; an extra character is not protection.

        I know it's very unlikely to be an issue

        As the old sayings go

        • if it ain't broke...
        • doctor it hurts when I ...
        Its only an issue if you choose to make it an issue by switching languages, or upgrading to incompatible versions ....

        but isn't it better to be sure?

        The way you make sure is to trust but verify, meaning write a test suite.

        I know it's very unlikely to be an issue but isn't it better to be sure?

        You're not sure. Randomly throwing extra backslashes into your regular expression because you worry that someday someone might port the code to a different regular expression engine that you don't know about right now (and you can't predict which regex engine that is or when this might happen) is superstition. Is that the best use of your time?

        Besides that, the substantive differences in a different language or a different version of Perl are more important and they'll require more work and more thinking than surface-level syntax.

        (Besides that, Perl 6 has a Perl 5 regex compatibility mode.)

Re: Should I escape all chrs in a regex?
by JavaFan (Canon) on Sep 29, 2010 at 07:32 UTC
    Maybe it will have some meaning in the future which will break my existing code.
    Extremely unlikely. Perl5 has never introduced a regexp change that suddenly made a non-special character (in a legal regexp) into a special character. And I doubt pre-perl5 ever did. Every new regexp feature introduced is carefully designed to use syntax that's currently invalid. With one exception: backslashed letters. It's documented that backslashed letters that currently do not have a special meaning may have in the future. But this warns.

    A > currently does not have a special meaning, it's unlikely to ever will.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://862533]
Approved by ahmad
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-19 20:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found