http://www.perlmonks.org?node_id=326137


in reply to This looks like someone sneezed and hit the keyboard

I'm going to re-enter your RE as though it has the /x modifier so that it's easier to comment on what it's doing... here goes:

/ .* # Match any quantity of any character (or # none at all) ( # group together, and capture. [\$#\%>~] # match any one of the following: $#%>~ | # OR \@\w~\$ # match literal @, a word character, and $ | # OR \\\[\\e\[0m\\\] \[0m # match "\[\e[0m\] [0m" ) # end capturing / grouping. \s? # match a single optional whitespace /x # End regexp.

So you put that all together and you get a regexp that will match a pretty wierd looking string.

The following strings should match (and MANY others too):

"Hi, I'm Dave\[\e[0m\] [0m"

"121#@$14asdf$"

"@h~$"

Looks pretty peculiar to me.


Dave

Replies are listed 'Best First'.
Re: Re: This looks like someone sneezed and hit the keyboard
by Theo (Priest) on Feb 03, 2004 at 15:41 UTC
    Okay, I guess this is a newbie question ... It looks to me like the /.* opening to the regex is greedy and would grab everything that was applied to it leaving nothing for the rest of the regex to match to. In otherwords, anything/everything would give a match.

    Other, wiser monks have not mentioned this, so I'm assuming I've missed something. Why isn't my assumption true?

    Update: Thanks to bunnyman, ysth and MCS for their gentle instruction.

    -Theo-
    (so many nodes and so little time ... )

      No, everything in the regex must match, not just the first part of it, and the part in the middle with the (one|two|three) must match too.

      The thing that you must remember is that regexes can backtrack -- if they get to the end of the string without having matched yet, they can go back a few letters and try again.

      So the .* part will first try to match the entire string, because it is greedy. Then the middle part (one|two|three) must match, but there is nothing left in the string, and we must backtrack and try again. First we try going one letter back, then two, and eventually we either find the match or we backtrack all the way to the start and then there is no match.

      The reason that most people say you shouldn't use .* is because it can match nothing (or everything) so matching just .* is pointless because it will match everything (including nothing) However, if you were looking for "hi" some ammount of text and then "there" you could use:

      $line =~ /hi.*there/;

      and it would match. Of course it's greedy and might not be exactly what you wanted but there are times when it is needed. However, it is overused a lot and usually something better can be used.

      To answer your question though, /.* doesn't grab everything because it has required stuff after that. If you try and match /.*some text/ It has to find "some text" or it will fail. However, if you try and match something like: /.*\d?/ it could match nothing since the \d is optional.

      Because for the match to succeed, one of the three (optA|optB|optC) options has to match. With the .* at the front, it will basically start at the end of the string and work backwards until it finds one of the alternates.

      The \s? at the end is useless though (unless $& is used).