Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Regex question - negatives

by ultranerds (Friar)
on Jan 13, 2011 at 09:24 UTC ( #882066=perlquestion: print w/ replies, xml ) Need Help??
ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've been reading the book "Mastering Regular Expressions", and I've come across something I need to use in a project - but I can't seem to get it to work

Here is some examples:
my $test = qq|asdfas dfas dfas df asfd[[bad tag]] [[table]] asdfa sf a +s [[asdf as f\|sfds]] as dfa sdf [[test new line]] foo|; $test =~ s{\[\[(.+)[^\]\]]}{ print "FOO: $1 \n"; }ge;
..what I would need to get returned, is:
bad tag table asdf as f\|sfds test new line
However, the code above seems to ignore the ^\]\] bit ?

C:\Users\Andy\Documents>perl test.pl FOO: bad tag]] [[table]] asdfa sf as [[asdf as f|sfds]] as dfa FOO: test new line]] fo
Can someone suggest to me what I'm doing wrong? Been trying to get this working for ages, and as far as I can see, it looks fine :/

TIA!

Andy

Comment on Regex question - negatives
Select or Download Code
Re: Regex question - negatives
by moritz (Cardinal) on Jan 13, 2011 at 09:43 UTC
    Can someone suggest to me what I'm doing wrong?

    The .+ looks wrong, since it can match [[ and ]] too. You should use [^\]]+ in the first place.

    Update: Here's a working regex:

    \[\[([^\]]+)]]

    But it stops at a single ]. To prevent that, you need a negative look-ahead:

    \[\[((?:(?!\]\]).)+)]]

    Now that's quite unreadable, so here in detail:

    \[\[ # opening delimiter ( # capture... (?: # a group (?!\]\]) # that does not start with ]] . # and is a single character long )+ # and many of these groups, at least one. ) ]] # closing delimiter
      Hi,

      Aaah ok - so I was calling it AFTER, which basically meant that it wasn't matching the string at the point I was expecting :) Thanks for that!

      This works fine now:
      $test =~ s{\[\[([^\]]+)\]}{ print "FOO: $1 \n"; }ge;
      Thanks again!

      Andy
      Hi,

      Thanks - just having a look into those negative look-ahead stuff, as I'm sure that will be good to know for other regexes.

      You example doesn't seem to work?
      \[\[ # opening delimiter ( # capture... (?: # a group (?!\]\]) # that does not start with ]] . # and is a single character long )+ # and many of these groups, at least one. ) ]] # closing delimiter
      "Substitution replacment not terminated at line24"

      any ideas?

      TIA

      Andy

        moritz's example is meant to be used with the x modifier which permits comments in a regex. For example:

        my $re= qr{\[\[ # opening delimiter ( # capture... (?: # a group (?!\]\]) # that does not start with ]] . # and is a single character long )+ # and many of these groups, at least one. ) ]] # closing delimiter }x; while ($test =~ m{$re}g) { print "FOO:<$1>\n"; }
      To prevent that, you need a negative look-ahead:
      No, you don't.
      $_ = "[[foo]] bar [[baz ] qux]] [quux]] fred [[waldo]]]"; say $1 while /\[\[([^]]*(?:\][^]]+)*)\]\]/g; __END__ foo baz ] qux waldo
      This uses some classical loop-unrolling, and is, IIRC, similar how Mastering Regular Expressions (first edition) matches C comments.
Re: Regex question - negatives
by ELISHEVA (Prior) on Jan 13, 2011 at 10:07 UTC

    A few additional points:

    • I notice that you want \| in your output. If you use qq|...| to quote the string you'll need to escape "\|" with "\\\|", not "\|". "\\" to escape the "\" and "\|" to escape the pipe.

      Alternatively, you could simplify matters and use a single quote syntax such as q{...} or any other delimiter that isn't in your string. \ will be taken literally, and the curly brace delimiters don't show up in your string so there is no need to escape the pipe or any other character.

    • If you know that bad tags end in double square braces you can dispense with the negative lookahead  m{\[\[([^\[]+)\]\]}g The main thing to get rid of is that (.+) because "." matches everything including the end of tag character.

    • When testing output via print statements, it is a good idea to put the delimiters front and back, just in case there is invisible whitespace at the beginning or end of your match, i.e. print "FOO:<$1>\n".

    Revised code incorporating above points:

    my $test = q{asdfas dfas dfas df asfd[[bad tag]] [[table]] asdfa sf as + [[asdf as f\|sfds]] as dfa sdf [[test new line]] foo}; while ($test =~ m{\[\[([^\[]+)\]\]}g) { print "FOO:<$1>\n"; } #output - same as required in OP FOO:<bad tag> FOO:<table> FOO:<asdf as f\|sfds> FOO:<test new line>
      Hi,

      Thanks - the \| is only there cos I am using qq|| (so using | in the string, causes perl to break :))

      The actual code people will use, is:

      [[foo|bar]] , not [[foo\|bar]]

      Cheers

      Andy
Re: Regex question - negatives
by ikegami (Pope) on Jan 13, 2011 at 17:24 UTC
    | v $test =~ s{\[\[(.+?)[^\]\]]}{ print "FOO: $1\n"; }ge;

    I'm really not sure why you are replacing the tags with the result of print, though. Unless the print is just a placeholder until you get this working, the code should be

    while ($test =~ s{\[\[(.+?)[^\]\]]}g) { print "FOO: $1\n"; }
      The pattern is wrong, and you don't want s///.
      while ($test =~ m{\[\[(.+?)\]\]}g) { print "FOO: $1\n"; }
        ack, that's what I thought I was posting. Thanks.
Re: Regex question - negatives
by eff_i_g (Curate) on Jan 13, 2011 at 17:49 UTC
    These folks have been kind enough to use /x and add comments, but in other cases you'll want YAPE::Regex::Explain on deck as your adventures into regex deepen.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://882066]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2015-07-07 03:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (87 votes), past polls