Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Pattern matching: Why no \1 in [ ]?

by Not_a_Number (Prior)
on Aug 07, 2003 at 19:37 UTC ( #282006=perlquestion: print w/replies, xml ) Need Help??

Not_a_Number has asked for the wisdom of the Perl Monks concerning the following question:

I thought, naively, that the easiest way of matching a string such as 'XOX' or 'TNT' but not 'XXX' or 'TTT' would be:

/(.)[^\1]\1/

...but this doesn't work:

$_ = 'BBB'; print /(.)[^\1]\1/ ? 'match' : 'no match';

This prints match. Indeed, it seems to match if I replace the middle 'B' with any single character (including '1' and '\').

Even more strangely, to me, if I 'un-negate' the character class:

/(.)[\1]\1/

...I don't seem to match anything (BTW, I get no complaints with strictures and warnings).

Context: I'm trying to code a simple substitution cipher solver (eg where 'ABCABC' will match 'murmur' and 'tsetse' but not 'booboo'). I'm aware that there are other ways of doing it, notably merlyn's post here. However, he uses (unless I'm mistaken) negative loook-behind magic, and I haven't got that far in Mastering Regular Expressions yet :-). I'm not looking for a solution, I'm just curious as to what /[^\1]/ and /[\1]/ actually match? I've scoured perlre and done an index search for the (admittedly large :-) part of Mastering Regular Expressions that I still haven't read, but found nothing that appears relevant to my question (that's not to say that there is nothing relevant, just that I might not have understood its relevance...).

TIA for your enlightenment.

dave

Replies are listed 'Best First'.
Re: Pattern matching: Why no \1 in [ ]?
by japhy (Canon) on Aug 07, 2003 at 20:25 UTC
    As has been stated, a character class is determined at a regex's compile-time, because it's turned into a big bitwise array, more or less. Having a backref in there would make it dynamic, and dynamic char classes don't exist (yet?).

    If $1 only has one character in it, then sure, you can do /(.)(?!\1)(?s:.)\1/ as was suggested. But if it's got more than one character, I think the "easiest" way is with the dynamic-regex assertion: /(.)(??{ "[^\Q$1\E]" })\1/.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Pattern matching: Why no \1 in [ ]?
by MarkM (Curate) on Aug 07, 2003 at 19:46 UTC

    \1 provides the ability to substitute the literal text string of a previously matched parameter into the regular expression. It doesn't give you the ability to recompile the regular expression. Character class ([...]) are a compile time directive. \1 is a run time directive. [\1] is interpretted as 'the character with value 1'.

Re: Pattern matching: Why no \1 in [ ]?
by artist (Parson) on Aug 07, 2003 at 19:49 UTC
    Look at Backreferences

    Also for your prolem can be solved easily without character class.

    $_ = 'abcab'; if(/^(.)(?!\1)(.)(?:(?!\1|\2).)\1\2$/){ print "$&\n"; }

    artist

Re: Pattern matching: Why no \1 in [ ]?
by japhy (Canon) on Aug 07, 2003 at 20:31 UTC
    In fact, merlyn's code uses the approach Juerd showed you. It's just a bunch of negative look-aheads for the backreferences you've already matched. For instance, text matching "ABCADB" would be the regex /^(.)(?!\1)(.)(?!\1|\2)(.)\1(?!\1|\2|\3)(.)\2$/s.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Pattern matching: Why no \1 in [ ]?
by Juerd (Abbot) on Aug 07, 2003 at 20:09 UTC
Re: Pattern matching: Why no \1 in [ ]?
by errr (Initiate) on Aug 08, 2003 at 06:55 UTC
    To see what the regexp is doing, you can use the re pragma. use re 'debug' to enable debugging:
    perl -Mre=debug -le 'print /(.)[^\1]\1/ ? "match" : "no match"' Freeing REx: `","' Compiling REx `(.)[^\1]\1' size 19 Got 156 bytes for offset annotations. first at 3 1: OPEN1(3) 3: REG_ANY(4) 4: CLOSE1(6) 6: ANYOF[\0\2-\377{unicode_all}](17) 17: REF1(19) 19: END(0)
    from this you can see that the \1 is being interpretted as a unicode 1 to check you can:
    perl -le '$_ = "B\1B"; print /(.)[^\1]\1/ ? "match" : "no match"'
    and you'll see that that is a "no match"

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://282006]
Approved by Enlil
Front-paged by Enlil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2021-05-06 09:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Perl 7 will be out ...





    Results (72 votes). Check out past polls.

    Notices?