Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Question on REGEXP

by Anonymous Monk
on Mar 10, 2014 at 14:06 UTC ( #1077687=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I wanted to ask a question regarding pattern matching. My problem is that I seem to be unable to match special characters, like | for example.
So, in my script that I am trying to match some IDS, like 1QJ8:A|PDBID|CHAIN, if I try to say:
$id_to_match="1QJ8:A|PDBID|CHAIN"; while(<>) { if($_=~/$id_to_match/) }

I get no matches (the ID exists though). Do I need to put something in the regexp?
Thank you!

Replies are listed 'Best First'.
Re: Question on REGEXP
by choroba (Bishop) on Mar 10, 2014 at 14:10 UTC
    | has a special meaning in a regex: it separates alternatives. If you know your variable should not contain any special characters, you can "quote" it by using \Q:
    my $id_to_match = '1QJ8:A|PDBID|CHAIN'; while (<>) { if (/\Q$id_to_match/) { # ... } }

    See Escape sequences in perlre for details.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      So this escapes everything, right? I mean, if you want to treat any given character as normal character, without any exceptions/special meanings.
        Yes. If you only want to escape some characters, you can try the hard way: adding \ in front of them in the string. If you want to escape anything in a substring, you can end the effect of \Q by \E.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Question on REGEXP
by AnomalousMonk (Chancellor) on Mar 10, 2014 at 20:03 UTC
    I get no matches (the ID exists though).

    Others have addressed the basic problem, but the statement quoted from the OP suggests something else is going on that may merit another look at the actual code (Update: and/or data).

    The example below shows that a match should have occurred even without metaquoting if the string being matched against contained the 'id' string as a substring; however, the actual matched sequence would not be as expected due to regex alternation operators embedded in the 'id' string.

    c:\@Work\Perl>perl -wMstrict -le "my $id = '1QJ8:A|PDBID|CHAIN'; my $str = 'xxx1QJ8:A|PDBID|CHAINxxx'; ;; print qq{matched '$&'} if $str =~ /$id/; " matched '1QJ8:A'

    BTW: All the usual cautions against the use of  $& in production code apply — and a cooty-shot too!

Re: Question on REGEXP
by 7stud (Deacon) on Mar 11, 2014 at 18:16 UTC
    My preferred way to escape a special regex character, e.g the alternation(|) character, is to use a character class:
    use strict; use warnings; use 5.014; my $pattern =<<'END_OF_PATTERN'; 1QJ8:A #These characters, followed by... [|] #a literal pipe, followed by... PDBID #these characters, folllowed by... [|] #a literal pipe, followed by... CHAIN #this word END_OF_PATTERN my $str = '1QJ8:A...PDBID....CHAIN...1QJ8:A|PDBID|CHAIN'; my @matches = $str =~ /$pattern/gxms; say "@matches"; --output:-- 1QJ8:A|PDBID|CHAIN

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1077687]
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2018-05-23 17:31 GMT
Find Nodes?
    Voting Booth?