Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Regex question - negatives

by ultranerds (Pilgrim)
on Jan 13, 2011 at 09:24 UTC ( #882066=perlquestion: print w/ replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've been reading the book "Mastering Regular Expressions", and I've come across something I need to use in a project - but I can't seem to get it to work

Here is some examples:
my $test = qq|asdfas dfas dfas df asfd[[bad tag]] [[table]] asdfa sf a +s [[asdf as f\|sfds]] as dfa sdf [[test new line]] foo|; $test =~ s{\[\[(.+)[^\]\]]}{ print "FOO: $1 \n"; }ge;
..what I would need to get returned, is:
bad tag table asdf as f\|sfds test new line
However, the code above seems to ignore the ^\]\] bit ?

C:\Users\Andy\Documents>perl test.pl FOO: bad tag]] [[table]] asdfa sf as [[asdf as f|sfds]] as dfa FOO: test new line]] fo
Can someone suggest to me what I'm doing wrong? Been trying to get this working for ages, and as far as I can see, it looks fine :/

TIA!

Andy

Comment on Regex question - negatives
Select or Download Code
Re: Regex question - negatives
by moritz (Cardinal) on Jan 13, 2011 at 09:43 UTC
    Can someone suggest to me what I'm doing wrong?

    The .+ looks wrong, since it can match [[ and ]] too. You should use [^\]]+ in the first place.

    Update: Here's a working regex:

    \[\[([^\]]+)]]

    But it stops at a single ]. To prevent that, you need a negative look-ahead:

    \[\[((?:(?!\]\]).)+)]]

    Now that's quite unreadable, so here in detail:

    \[\[ # opening delimiter ( # capture... (?: # a group (?!\]\]) # that does not start with ]] . # and is a single character long )+ # and many of these groups, at least one. ) ]] # closing delimiter
      Hi,

      Aaah ok - so I was calling it AFTER, which basically meant that it wasn't matching the string at the point I was expecting :) Thanks for that!

      This works fine now:
      $test =~ s{\[\[([^\]]+)\]}{ print "FOO: $1 \n"; }ge;
      Thanks again!

      Andy
      Hi,

      Thanks - just having a look into those negative look-ahead stuff, as I'm sure that will be good to know for other regexes.

      You example doesn't seem to work?
      \[\[ # opening delimiter ( # capture... (?: # a group (?!\]\]) # that does not start with ]] . # and is a single character long )+ # and many of these groups, at least one. ) ]] # closing delimiter
      "Substitution replacment not terminated at line24"

      any ideas?

      TIA

      Andy

        moritz's example is meant to be used with the x modifier which permits comments in a regex. For example:

        my $re= qr{\[\[ # opening delimiter ( # capture... (?: # a group (?!\]\]) # that does not start with ]] . # and is a single character long )+ # and many of these groups, at least one. ) ]] # closing delimiter }x; while ($test =~ m{$re}g) { print "FOO:<$1>\n"; }
      To prevent that, you need a negative look-ahead:
      No, you don't.
      $_ = "[[foo]] bar [[baz ] qux]] [quux]] fred [[waldo]]]"; say $1 while /\[\[([^]]*(?:\][^]]+)*)\]\]/g; __END__ foo baz ] qux waldo
      This uses some classical loop-unrolling, and is, IIRC, similar how Mastering Regular Expressions (first edition) matches C comments.
Re: Regex question - negatives
by ELISHEVA (Prior) on Jan 13, 2011 at 10:07 UTC

    A few additional points:

    • I notice that you want \| in your output. If you use qq|...| to quote the string you'll need to escape "\|" with "\\\|", not "\|". "\\" to escape the "\" and "\|" to escape the pipe.

      Alternatively, you could simplify matters and use a single quote syntax such as q{...} or any other delimiter that isn't in your string. \ will be taken literally, and the curly brace delimiters don't show up in your string so there is no need to escape the pipe or any other character.

    • If you know that bad tags end in double square braces you can dispense with the negative lookahead  m{\[\[([^\[]+)\]\]}g The main thing to get rid of is that (.+) because "." matches everything including the end of tag character.

    • When testing output via print statements, it is a good idea to put the delimiters front and back, just in case there is invisible whitespace at the beginning or end of your match, i.e. print "FOO:<$1>\n".

    Revised code incorporating above points:

    my $test = q{asdfas dfas dfas df asfd[[bad tag]] [[table]] asdfa sf as + [[asdf as f\|sfds]] as dfa sdf [[test new line]] foo}; while ($test =~ m{\[\[([^\[]+)\]\]}g) { print "FOO:<$1>\n"; } #output - same as required in OP FOO:<bad tag> FOO:<table> FOO:<asdf as f\|sfds> FOO:<test new line>
      Hi,

      Thanks - the \| is only there cos I am using qq|| (so using | in the string, causes perl to break :))

      The actual code people will use, is:

      [[foo|bar]] , not [[foo\|bar]]

      Cheers

      Andy
Re: Regex question - negatives
by ikegami (Pope) on Jan 13, 2011 at 17:24 UTC
    | v $test =~ s{\[\[(.+?)[^\]\]]}{ print "FOO: $1\n"; }ge;

    I'm really not sure why you are replacing the tags with the result of print, though. Unless the print is just a placeholder until you get this working, the code should be

    while ($test =~ s{\[\[(.+?)[^\]\]]}g) { print "FOO: $1\n"; }
      The pattern is wrong, and you don't want s///.
      while ($test =~ m{\[\[(.+?)\]\]}g) { print "FOO: $1\n"; }
        ack, that's what I thought I was posting. Thanks.
Re: Regex question - negatives
by eff_i_g (Curate) on Jan 13, 2011 at 17:49 UTC
    These folks have been kind enough to use /x and add comments, but in other cases you'll want YAPE::Regex::Explain on deck as your adventures into regex deepen.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://882066]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2014-07-22 08:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (108 votes), past polls