Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Substitute for variable-length look-behind?

by diotalevi (Canon)
on Jan 26, 2004 at 17:39 UTC ( #324190=perlquestion: print w/replies, xml ) Need Help??

diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

use Term::ANSIColor ':constants'; # Note that there is not normally a space betweeh the </ and code>. I +added that so that Perlmonks.org wouldn't parse the post incorrectly. $_ = "normal <code> green <!-- yellow [red] normal --> normal </ code> + normal"; $GREEN = GREEN; $YELLOW = YELLOW; $RED = RED; $RESET = RESET; s(<code>(.+?)</ code>)($GREEN$1$RESET)g; s((?<=\x1b)\[(.+?)\])($RED$1$RESET)g; s((<!--.+?-->))($YELLOW$1$RESET)g;

The preceding program produces the following string. Overlapping is not detected and each markup element terminates the enclosing element prematurely.

"normal " . GREEN . " green " . YELLOW . "<!-- yellow " . RED . "red" . RESET . " normal -->" . RESET . " normal" . RESET . " normal"

I'd like it to produce this. If perl had variable length look-behind I'd write this as (?<=(?:^|$RESET)[^\x1b]*). Any ideas for how to write this without using perl's experimental regexp features ( (?{ code }), (??{ rx }), and (?(expr)true|false))?

"normal " . GREEN . " green <!-- yellow [red] normal --> normal" . RESET . " normal"

Replies are listed 'Best First'.
Re: Substitute for variable-length look-behind?
by particle (Vicar) on Jan 26, 2004 at 18:15 UTC

    reverse the string and use a variable-width look-ahead assertion.

    ~Particle *accelerates*

      aka sexeger ... my thoughts exactly.

      --
      I'm not belgian but I play one on TV.

Re: Substitute for variable-length look-behind?
by Roy Johnson (Monsignor) on Jan 26, 2004 at 19:35 UTC
    Here's another solution, somewhat more elegant than my previous one.
    s{<code>(.+?)</ code>|\[(.+?)\]|(<!--.+?-->)} {defined $1 ? "$GREEN$1$RESET" : defined $2 ? "$RED$2$RESET" : defined $3 ? "$YELLOW$3$RESET" : warn "Broken with $+\n" }ge;

    The PerlMonk tr/// Advocate
Re: Substitute for variable-length look-behind?
by diotalevi (Canon) on Jan 26, 2004 at 21:13 UTC

    I ran with a combination of Roy, tye, and Zaxo. s///e to get multiple lvalues into a string (one lvalue per line in the string), then Roy's idea to walk pos() in that line followed up with some modifications to substr() lvalues.

    s(^(.+?\Q$RESET | \E)(.+)){ my $header = $1; my $comment = $2; $comment =~ s((?: <code> (.*?) </ code> | \[ (.*?) \] | (<!-- .*? -->) )){ ( ( defined( $1 ) && ( GREEN . $1 ) ) || ( defined( $2 ) && ( RED . $2 ) ) || ( defined( $3 ) && ( YELLOW . $3 ) ) ) . RESET }gex; "$header$comment"; }meg;
Re: Substitute for variable-length look-behind?
by bart (Canon) on Jan 26, 2004 at 20:30 UTC
    You must take care that your lookbehind, if incorporated into the same regexp, doesn't change what you match. If it's too greedy, the starting point of the former pattern might shift backwards.

    So, IMO, the safest way is to use two patterns. I'm not sure if you can integrate it into one pattern, I doubt it, so that you can still simply make the whole match fail if the lookbehind fails. Two independent matches won't do that. Anyway, enough blahblah, here's my coarse idea:

    my $success; while(/PATTERN/g) { if(substr($_, 0, $-[0]) =~ /LOOKBEHIND\z/) { # got a match! $success = 1; last; } }
    For example:
    $_ = 'bar bar obar foooooobar bar'; my($success, $start) = 0; while(/bar/g) { if(substr($_, 0, $start = $-[0]) =~ /fo+\z/) { # got a match! $success = 1; last; } } print "$success: $start\n";
    printing:
      1: 35
    
    Using $start, the start position of the outer match, you can try again if you want, to get the captured values and @- and @+. For some odd reason, capturing @- and @+ in the loop made it loop forever. *shrug*
Re: Substitute for variable-length look-behind?
by Roy Johnson (Monsignor) on Jan 26, 2004 at 18:25 UTC
    Clunky, but may be instructional.
    while (m(<code>|\[|<!--)g) { if ($& eq '<code>') { s{\G(.+?)</ code>}{$GREEN$1$RESET}; pos($_) += length("$GREEN$1$RESET"); } elsif ($& eq '[') { s{\G(.+?)\]}{$RED$1$RESET}; pos($_) += length("$RED$1$RESET"); } elsif ($& eq '<!--') { s{\G(.+?)-->}{$YELLOW<!--$1$-->RESET}; pos($_) += length("$YELLOW$1$RESET"); } }

    The PerlMonk tr/// Advocate
Re: Substitute for variable-length look-behind?
by sleepingsquirrel (Hermit) on Jan 26, 2004 at 19:18 UTC
    This snippet might work for you. I'm assuming you want no nesting of tags and the first (left most) one wins.
    use Term::ANSIColor ':constants'; # Note that there is not normally an 'x' betweeh the </ and code>. I # added that so that Perlmonks.org wouldn't parse the post incorrectly +. $_ = "normal <code> green <!-- yellow [red] normal --> normal </xcode> ++ normal"; $delimit{"code"} = GREEN; $delimit{"--"} = YELLOW; $delimit{"]"} = RED; $RESET = RESET; s{ (?:<code>(.+?)</x(code)>) | (?:(?<=\x1b)\[(.+?)(\])) | (?:(<!--.+?(--)>)) }{$delimit{$2|$4|$6}.($1|$3|$5).$RESET}egx; print "$_\n";
Re: Substitute for variable-length look-behind?
by Abigail-II (Bishop) on Jan 26, 2004 at 17:48 UTC
    Any ideas for how to write this without using perl's experimental regexp features
    Use a parser.

    Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://324190]
Approved by ybiC
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2021-12-05 22:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (31 votes). Check out past polls.

    Notices?