Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: [OT] Thoughts on Ruby's new absent operator?

by haukex (Archbishop)
on Mar 24, 2017 at 12:43 UTC ( [id://1185776]=note: print w/replies, xml ) Need Help??


in reply to [OT] Thoughts on Ruby's new absent operator?

It would probably be better to link to the article itself. Anyway, I get the idea that (?~abc) would be analogous to [^x], except that the former is for multi-character sequences. I assume that similar effects can be achieved in Perl with Lookaround Assertions. What I'm missing at the moment are some more examples of practical applications - the only one mentioned in the article and the Ruby docs is not matching invalid C comments (second final example below).

An excerpt from the Ruby docs:

(?~subexp) absent operator (experimental)

Matches any string which doesn't contain any string which matches subexp. Similar to (?:(?!subexp).)*, but easy to write.

Unlike (?:(?!abc).)*c, (?~abc)c matches "abc", because (?~abc) matches "ab".

A sandbox for finding equivalent expressions:

use warnings; use strict; use Test::More; # in Ruby: (?~abc) my $re1 = qr{ \A (?: (?!abc) . )* \z }x; like '', $re1; like 'ab', $re1; like 'aab', $re1; like 'ccdd', $re1; unlike 'abc', $re1; unlike 'aabc', $re1; unlike 'ccccabc', $re1; unlike 'ccabcdd', $re1; # in Ruby: (?~abc)c # this example fails in Perl like 'abc', qr{ \A (?: (?!abc) . )* c \z }x; # in Ruby: \A\/\*(?~\*\/)\*\/\z my $re2 = qr{ \A /\* ( (?!\*/). )*? \*/ \z }x; like '/**/', $re2; like '/* foobar */', $re2; unlike '/**/ */', $re2; done_testing;

Replies are listed 'Best First'.
Re^2: [OT] Thoughts on Ruby's new absent operator?
by LanX (Saint) on Mar 24, 2017 at 16:59 UTC
    > Anyway, I get the idea that (?~abc) would be analogous to ^x, except that the former is for multi-character sequences.

    Thanks, now I get the idea and I have to admit that I missed this feature in the past, though I can't recall when exactly.

    But all useful use-cases which come to my mind involve strict boundaries, like parsing a grammar and explicitely excluding certain commands.

    Like parsing a html but not wanting "a" and "img"-tags while allowing "anchor"

    This would mean to use something like

     /<\s*\b(?~a|img)\b(.*)>/

    and this should be achievable with

    use strict; use warnings; undef $/; $_ = <DATA>; # slurp print "1: $1 - $2\n" while /<\s*(?!a\b|img\b)(\w+)\b(.*?)\s*>/g; print "2: $1 - $2\n" while /<\s*(?!a|img)(\w+)(.*?)\s*>/g; my %absent = ( a=>1, img =>1 ); while (/<\s*(\w+)(.*?)\s*>/g) { print "3: $1 - $2\n" unless $absent{$1}; } while (/<\s*(\w+)\b(??{ $absent{$1} ? '^' : '' })(.*?)\s*>/g) { print "4: $1 - $2\n"; } __DATA__ <a href='bla'> <img href='bla'> <table style=""> <anchor > <tr width=100>

    1: table - style="" 1: anchor - 1: tr - width=100 2: table - style="" 2: tr - width=100 3: table - style="" 3: anchor - 3: tr - width=100 4: table - style="" 4: anchor - 4: tr - width=100

    please note how important it is to repeat the delimiting \b in case 1, which might justify the use of (?~...) .

    Though to be sure I'd need to test the Ruby implementation, and I'm not willing to install yet.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

    PS: Choosing HTML as data grammar was - as always - unfortunate. You are free to propose something different.

Re^2: [OT] Thoughts on Ruby's new absent operator?
by Anonymous Monk on Mar 24, 2017 at 15:19 UTC
    The C-style comments pattern isn't really a practical example. They can be matched more easily as qr{/\*.*?\*/}, but the problem is stated in an unnatural way to exclude that.
      But non-greedy matching is known to be slow. Maybe I'll need to make some benchmarks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1185776]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-25 14:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found