Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Regexp Confuzzelemt

by AntsPants (Novice)
on Apr 27, 2007 at 14:11 UTC ( #612381=perlquestion: print w/replies, xml ) Need Help??
AntsPants has asked for the wisdom of the Perl Monks concerning the following question:

Bonjour Monkers,

I have a regex that just doesn;t do what I think it should, no matter how loud I read out in English what it should be doing ;)

Trying to match blah/blah/blah#ThisIsWhatIWant?ButNotThisEtc

#(.*[^\?]) doesn't work

Whereas

#([\w]+[^\?]) does work!!

the one that fails, I've tried a host of ways including

#(.*)\[^\?\] #(.+)\[^\?\] .... and more

But I keep getting ThisIsWhatIWant?ButNotThisEtc matched

Any pointers would be terrrrrrrrrific.

Merci

-Ants

Replies are listed 'Best First'.
Re: Regexp Confuzzelemt
by suaveant (Parson) on Apr 27, 2007 at 14:20 UTC
    #.*[^\?] is actually matching everything after the hash, then the c at the end, because [^\?] matches a single character that is NOT a question mark...

    I think what you are looking for is

    #([^?]+) # don't need \? in character class []
    That will match 1 or more non question mark characters after a #

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Regexp Confuzzelemt
by bobf (Monsignor) on Apr 27, 2007 at 14:26 UTC

    This seems to do what you want:

    use warnings; use strict; my $string = 'blah/blah/blah#ThisIsWhatIWant?ButNotThisEtc'; if( $string =~ m/#([^?]*)/ ) { print "matched -->$1<--\n"; }
    Output:
    matched -->ThisIsWhatIWant<--
    This looks like a URL, though, and I wonder if something from CPAN would do the trick. Perhaps another monk knows of a module.

    Update: YAPE::Regex::Explain is a handy little module that will take a regex and produce a human-readable explanation of it. For example,

    use YAPE::Regex::Explain; my $re = qr/#([^?]*)/; my $p = YAPE::Regex::Explain->new($re)->explain; print "$p\n";
    Output:
    The regular expression: (?-imsx:#([^?]*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- # '#' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^?]* any character except: '?' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    You could use this (as well as perlre) to help yourself understand why the regexen you initially tried weren't working as you expected. Trying things at random can be one way to learn how things work, but in the end Don't Program by Coincidence. :-)

Re: Regexp Confuzzelemt
by jettero (Monsignor) on Apr 27, 2007 at 14:19 UTC

    I'm guessing, but I think [^\?] doesn't do what you think... That means to match any one character that is not a '?'.

    I would choose something simple like this:

    if( $line =~ m/(?<=\#)(\w+)(?=\?\w+)/ ) {

    Of course, you probably want to make sure the word characters start with caps and things...

    -Paul

Re: Regexp Confuzzelemt
by akho (Hermit) on Apr 27, 2007 at 21:41 UTC
    Regular expressions are greedy by default. That means (.*[^\?]) first matches the whole input for .*, backtracks one character and matches "c" for [^\?].

    You want #(.*?)\? (the '?' makes the .* non-greedy).

    #([^?]*) also works, of course. But it also matches "TTT" in "aaaaaaaa#TTT" (note that there is no question mark).

      But it also matches "TTT" in "aaaaaaaa#TTT"

      That's a difference you introduced. If you put the \? back in, it won't.
      /#(.*?)\?/
      is equivalent to
      /#([^?]*)\?/

      The latter is safer. It's easy to introduced problems when using non-greedy quantifiers. They're not as resilient. For example, you can safely embed the greedy regexp in a larger regexp, but you can't safely embed the non-greedy regexp in a larger regexp.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://612381]
Approved by jettero
help
Chatterbox?
[Corion]: marto: Ow! I would assume there is a cron job monitoring the free disk space and automatically opening a ticket at 90%, 95% and 100% usage...
[Corion]: Even we had automatic emails back when we maintained the machine ourselves...
[marto]: Corion you under estimate how lazy these admins are :P
[Discipulus]: we too; using opsview alarms
[marto]: the key word: outsourcing ;)
[Corion]: marto: Yeah, feels like that ;) You could set up the cronjob that auto-creates tickets :-))
[marto]: the ticketing system does not accept calls via email, nor has it a working API. It's tied into Active Directory for authentication and the Solaris boxes aren't on that domain
[Corion]: The one thing I haven't figured out a solution to is how to get an edge-trigger instead of sending an email every 5 minutes if the usage is above 90%. I want one mail when it goes over 90% but no more emails as long as it stays between 90% and 95%.
[Corion]: marto: Clever! ;)
[Corion]: You can only reach me by pager

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2017-01-24 10:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you watch meteor showers?




    Results (203 votes). Check out past polls.