Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

lookbehind regexp

by cmic (Acolyte)
on Jun 10, 2019 at 14:35 UTC ( #11101201=perlquestion: print w/replies, xml ) Need Help??

cmic has asked for the wisdom of the Perl Monks concerning the following question:

Was trying to pring only lines *not* begining with TAGS, but this snippet doesn't works. And I don't know why.. Thans for any explanation.
# # Negative lookbehind , huh ?? # while (<DATA>) { if (/(?<!TAGS)(.*?)(?=TAG2)/xgi ) { print "Negative lookbehind: "; print $1, "\n"; } } # __DATA__ TAG1 text one TAG2 TAG1 text two TAG2 TAG1 text three TAG2 TAGS text four TAG2 TAG1 text five TAGT TAG1 text six TAG2

This code prints:
Negative lookbehind: TAG1 text one
Negative lookbehind: TAG1 text two
Negative lookbehind: TAG1 text three
Negative lookbehind: TAGS text four
Negative lookbehind: TAG1 text six

-- cmic. Life helps. Perl Too.

Replies are listed 'Best First'.
Re: lookbehind regexp
by tybalt89 (Parson) on Jun 10, 2019 at 14:50 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=11101201 use strict; use warnings; # # Negative lookbehind , huh ?? # while (<DATA>) { if (/^(?!TAGS)(.*?)(?=TAG2)/xgi ) { print "Non-Negative lookbehind: "; print $1, "\n"; } } # __DATA__ TAG1 text one TAG2 TAG1 text two TAG2 TAG1 text three TAG2 TAGS text four TAG2 TAG1 text five TAGT TAG1 text six TAG2
Re: lookbehind regexp
by hippo (Canon) on Jun 10, 2019 at 14:55 UTC

    This fails because your lookbehind isn't anchored. You can either anchor it where you want or if it's really always at the start, replace the lookbehind with a lookahead like so:

    while (<DATA>) { if (/^(?!TAGS)(.*?)(?=TAG2)/xgi ) { print "Negative lookahead: "; print $1, "\n"; } } # __DATA__ TAG1 text one TAG2 TAG1 text two TAG2 TAG1 text three TAG2 TAGS text four TAG2 TAG1 text five TAGT TAG1 text six TAG2
Re: lookbehind regexp
by Athanasius (Bishop) on Jun 10, 2019 at 15:01 UTC

    Hello cmic,

    Did you notice that for the line beginning with "TAGS", the capture includes that word? Even though it is non-greedy, (.*?) starts looking at the beginning of the string and eventually settles on the capture string "TAGS text four   ", which satisfies both the lookahead (because it is followed by "TAG2") AND the negative lookbehind — because the string (which includes "TAGS") is not preceeded by "TAGS" !

    In general, you can’t combine a negative lookbehind assertion with a match-any-character(s) capture. Do this instead:

    use strict; use warnings; while (<DATA>) { print "$1\n" if !/^TAGS/ && /(.*?)(?=TAG2)/; }

    Update: Actually, in this case you don’t need a lookahead assertion either. This does just as well:

    print "$1\n" if !/^TAGS/ && /(.*?)TAG2/;

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      OK. I get it. But I was just trying to use the "negative lookbehind". I already used the solution with positive lookahead or your solution (which is more elegant, for sure). THX for your help

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11101201]
Approved by haukex
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2019-06-25 20:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (107 votes). Check out past polls.

    Notices?