Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Regexp Confuzzelemt

by AntsPants (Novice)
on Apr 27, 2007 at 14:11 UTC ( #612381=perlquestion: print w/ replies, xml ) Need Help??
AntsPants has asked for the wisdom of the Perl Monks concerning the following question:

Bonjour Monkers,

I have a regex that just doesn;t do what I think it should, no matter how loud I read out in English what it should be doing ;)

Trying to match blah/blah/blah#ThisIsWhatIWant?ButNotThisEtc

#(.*[^\?]) doesn't work


#([\w]+[^\?]) does work!!

the one that fails, I've tried a host of ways including

#(.*)\[^\?\] #(.+)\[^\?\] .... and more

But I keep getting ThisIsWhatIWant?ButNotThisEtc matched

Any pointers would be terrrrrrrrrific.



Comment on Regexp Confuzzelemt
Select or Download Code
Replies are listed 'Best First'.
Re: Regexp Confuzzelemt
by suaveant (Parson) on Apr 27, 2007 at 14:20 UTC
    #.*[^\?] is actually matching everything after the hash, then the c at the end, because [^\?] matches a single character that is NOT a question mark...

    I think what you are looking for is

    #([^?]+) # don't need \? in character class []
    That will match 1 or more non question mark characters after a #

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Regexp Confuzzelemt
by bobf (Monsignor) on Apr 27, 2007 at 14:26 UTC

    This seems to do what you want:

    use warnings; use strict; my $string = 'blah/blah/blah#ThisIsWhatIWant?ButNotThisEtc'; if( $string =~ m/#([^?]*)/ ) { print "matched -->$1<--\n"; }
    matched -->ThisIsWhatIWant<--
    This looks like a URL, though, and I wonder if something from CPAN would do the trick. Perhaps another monk knows of a module.

    Update: YAPE::Regex::Explain is a handy little module that will take a regex and produce a human-readable explanation of it. For example,

    use YAPE::Regex::Explain; my $re = qr/#([^?]*)/; my $p = YAPE::Regex::Explain->new($re)->explain; print "$p\n";
    The regular expression: (?-imsx:#([^?]*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- # '#' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^?]* any character except: '?' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    You could use this (as well as perlre) to help yourself understand why the regexen you initially tried weren't working as you expected. Trying things at random can be one way to learn how things work, but in the end Don't Program by Coincidence. :-)

Re: Regexp Confuzzelemt
by jettero (Monsignor) on Apr 27, 2007 at 14:19 UTC

    I'm guessing, but I think [^\?] doesn't do what you think... That means to match any one character that is not a '?'.

    I would choose something simple like this:

    if( $line =~ m/(?<=\#)(\w+)(?=\?\w+)/ ) {

    Of course, you probably want to make sure the word characters start with caps and things...


Re: Regexp Confuzzelemt
by akho (Hermit) on Apr 27, 2007 at 21:41 UTC
    Regular expressions are greedy by default. That means (.*[^\?]) first matches the whole input for .*, backtracks one character and matches "c" for [^\?].

    You want #(.*?)\? (the '?' makes the .* non-greedy).

    #([^?]*) also works, of course. But it also matches "TTT" in "aaaaaaaa#TTT" (note that there is no question mark).

      But it also matches "TTT" in "aaaaaaaa#TTT"

      That's a difference you introduced. If you put the \? back in, it won't.
      is equivalent to

      The latter is safer. It's easy to introduced problems when using non-greedy quantifiers. They're not as resilient. For example, you can safely embed the greedy regexp in a larger regexp, but you can't safely embed the non-greedy regexp in a larger regexp.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://612381]
Approved by jettero
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2015-10-07 00:30 GMT
Find Nodes?
    Voting Booth?

    Does Humor Belong in Programming?

    Results (166 votes), past polls