Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Question on REGEXP

by Anonymous Monk
on Mar 10, 2014 at 14:06 UTC ( #1077687=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I wanted to ask a question regarding pattern matching. My problem is that I seem to be unable to match special characters, like | for example.
So, in my script that I am trying to match some IDS, like 1QJ8:A|PDBID|CHAIN, if I try to say:
$id_to_match="1QJ8:A|PDBID|CHAIN"; while(<>) { if($_=~/$id_to_match/) }

I get no matches (the ID exists though). Do I need to put something in the regexp?
Thank you!

Replies are listed 'Best First'.
Re: Question on REGEXP
by choroba (Chancellor) on Mar 10, 2014 at 14:10 UTC
    | has a special meaning in a regex: it separates alternatives. If you know your variable should not contain any special characters, you can "quote" it by using \Q:
    my $id_to_match = '1QJ8:A|PDBID|CHAIN'; while (<>) { if (/\Q$id_to_match/) { # ... } }

    See Escape sequences in perlre for details.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      So this escapes everything, right? I mean, if you want to treat any given character as normal character, without any exceptions/special meanings.
        Yes. If you only want to escape some characters, you can try the hard way: adding \ in front of them in the string. If you want to escape anything in a substring, you can end the effect of \Q by \E.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Question on REGEXP
by AnomalousMonk (Chancellor) on Mar 10, 2014 at 20:03 UTC
    I get no matches (the ID exists though).

    Others have addressed the basic problem, but the statement quoted from the OP suggests something else is going on that may merit another look at the actual code (Update: and/or data).

    The example below shows that a match should have occurred even without metaquoting if the string being matched against contained the 'id' string as a substring; however, the actual matched sequence would not be as expected due to regex alternation operators embedded in the 'id' string.

    c:\@Work\Perl>perl -wMstrict -le "my $id = '1QJ8:A|PDBID|CHAIN'; my $str = 'xxx1QJ8:A|PDBID|CHAINxxx'; ;; print qq{matched '$&'} if $str =~ /$id/; " matched '1QJ8:A'

    BTW: All the usual cautions against the use of  $& in production code apply — and a cooty-shot too!

Re: Question on REGEXP
by 7stud (Deacon) on Mar 11, 2014 at 18:16 UTC
    My preferred way to escape a special regex character, e.g the alternation(|) character, is to use a character class:
    use strict; use warnings; use 5.014; my $pattern =<<'END_OF_PATTERN'; 1QJ8:A #These characters, followed by... [|] #a literal pipe, followed by... PDBID #these characters, folllowed by... [|] #a literal pipe, followed by... CHAIN #this word END_OF_PATTERN my $str = '1QJ8:A...PDBID....CHAIN...1QJ8:A|PDBID|CHAIN'; my @matches = $str =~ /$pattern/gxms; say "@matches"; --output:-- 1QJ8:A|PDBID|CHAIN

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1077687]
Front-paged by Corion
[Corion]: Hmm - HTC Vive reduces its price to EUR 700 (from EUR 900). Now I have to look what graphics cards prices are - maybe a good enough VR rig can be had for EUR 1500 now instead of EUR 2000(+)
[Corion]: Hmm - nVidia GTX 1070 + Vive makes EUR 1130, leaving EUR 370 for CPU+case+cooler+ RAM - not bad
[Corion]: (I'm not a fan of AMD)
[Corion]: On the other hand, the Radeon 580 RX would cost 380 instead of 430 ... But still, that's ATi and I've been unhappy with ATi drivers the two times I had one in my desktops

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2017-08-21 08:33 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (319 votes). Check out past polls.