Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Regexp Confuzzelemt

by AntsPants (Novice)
on Apr 27, 2007 at 14:11 UTC ( #612381=perlquestion: print w/ replies, xml ) Need Help??
AntsPants has asked for the wisdom of the Perl Monks concerning the following question:

Bonjour Monkers,

I have a regex that just doesn;t do what I think it should, no matter how loud I read out in English what it should be doing ;)

Trying to match blah/blah/blah#ThisIsWhatIWant?ButNotThisEtc

#(.*[^\?]) doesn't work

Whereas

#([\w]+[^\?]) does work!!

the one that fails, I've tried a host of ways including

#(.*)\[^\?\] #(.+)\[^\?\] .... and more

But I keep getting ThisIsWhatIWant?ButNotThisEtc matched

Any pointers would be terrrrrrrrrific.

Merci

-Ants

Comment on Regexp Confuzzelemt
Select or Download Code
Re: Regexp Confuzzelemt
by jettero (Monsignor) on Apr 27, 2007 at 14:19 UTC

    I'm guessing, but I think [^\?] doesn't do what you think... That means to match any one character that is not a '?'.

    I would choose something simple like this:

    if( $line =~ m/(?<=\#)(\w+)(?=\?\w+)/ ) {

    Of course, you probably want to make sure the word characters start with caps and things...

    -Paul

Re: Regexp Confuzzelemt
by suaveant (Parson) on Apr 27, 2007 at 14:20 UTC
    #.*[^\?] is actually matching everything after the hash, then the c at the end, because [^\?] matches a single character that is NOT a question mark...

    I think what you are looking for is

    #([^?]+) # don't need \? in character class []
    That will match 1 or more non question mark characters after a #

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Regexp Confuzzelemt
by bobf (Monsignor) on Apr 27, 2007 at 14:26 UTC

    This seems to do what you want:

    use warnings; use strict; my $string = 'blah/blah/blah#ThisIsWhatIWant?ButNotThisEtc'; if( $string =~ m/#([^?]*)/ ) { print "matched -->$1<--\n"; }
    Output:
    matched -->ThisIsWhatIWant<--
    This looks like a URL, though, and I wonder if something from CPAN would do the trick. Perhaps another monk knows of a module.

    Update: YAPE::Regex::Explain is a handy little module that will take a regex and produce a human-readable explanation of it. For example,

    use YAPE::Regex::Explain; my $re = qr/#([^?]*)/; my $p = YAPE::Regex::Explain->new($re)->explain; print "$p\n";
    Output:
    The regular expression: (?-imsx:#([^?]*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- # '#' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^?]* any character except: '?' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    You could use this (as well as perlre) to help yourself understand why the regexen you initially tried weren't working as you expected. Trying things at random can be one way to learn how things work, but in the end Don't Program by Coincidence. :-)

Re: Regexp Confuzzelemt
by akho (Hermit) on Apr 27, 2007 at 21:41 UTC
    Regular expressions are greedy by default. That means (.*[^\?]) first matches the whole input for .*, backtracks one character and matches "c" for [^\?].

    You want #(.*?)\? (the '?' makes the .* non-greedy).

    #([^?]*) also works, of course. But it also matches "TTT" in "aaaaaaaa#TTT" (note that there is no question mark).

      But it also matches "TTT" in "aaaaaaaa#TTT"

      That's a difference you introduced. If you put the \? back in, it won't.
      /#(.*?)\?/
      is equivalent to
      /#([^?]*)\?/

      The latter is safer. It's easy to introduced problems when using non-greedy quantifiers. They're not as resilient. For example, you can safely embed the greedy regexp in a larger regexp, but you can't safely embed the non-greedy regexp in a larger regexp.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://612381]
Approved by jettero
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-09-23 05:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls