Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

[OT] Thoughts on Ruby's new absent operator?

by perlancar (Scribe)
on Mar 24, 2017 at 11:22 UTC ( #1185762=perlmeditation: print w/replies, xml ) Need Help??

Dear monks,

Any thoughts on Ruby's new absent operator in its regular expressions (Hacker News thread)? Looks nifty to me and lets one specify more succinctly in some cases, although the things that it do can be done via other means.

  • Comment on [OT] Thoughts on Ruby's new absent operator?

Replies are listed 'Best First'.
Re: [OT] Thoughts on Ruby's new absent operator?
by haukex (Prior) on Mar 24, 2017 at 12:43 UTC

    It would probably be better to link to the article itself. Anyway, I get the idea that (?~abc) would be analogous to [^x], except that the former is for multi-character sequences. I assume that similar effects can be achieved in Perl with Lookaround Assertions. What I'm missing at the moment are some more examples of practical applications - the only one mentioned in the article and the Ruby docs is not matching invalid C comments (second final example below).

    An excerpt from the Ruby docs:

    (?~subexp) absent operator (experimental)

    Matches any string which doesn't contain any string which matches subexp. Similar to (?:(?!subexp).)*, but easy to write.

    Unlike (?:(?!abc).)*c, (?~abc)c matches "abc", because (?~abc) matches "ab".

    A sandbox for finding equivalent expressions:

    use warnings; use strict; use Test::More; # in Ruby: (?~abc) my $re1 = qr{ \A (?: (?!abc) . )* \z }x; like '', $re1; like 'ab', $re1; like 'aab', $re1; like 'ccdd', $re1; unlike 'abc', $re1; unlike 'aabc', $re1; unlike 'ccccabc', $re1; unlike 'ccabcdd', $re1; # in Ruby: (?~abc)c # this example fails in Perl like 'abc', qr{ \A (?: (?!abc) . )* c \z }x; # in Ruby: \A\/\*(?~\*\/)\*\/\z my $re2 = qr{ \A /\* ( (?!\*/). )*? \*/ \z }x; like '/**/', $re2; like '/* foobar */', $re2; unlike '/**/ */', $re2; done_testing;
      > Anyway, I get the idea that (?~abc) would be analogous to ^x, except that the former is for multi-character sequences.

      Thanks, now I get the idea and I have to admit that I missed this feature in the past, though I can't recall when exactly.

      But all useful use-cases which come to my mind involve strict boundaries, like parsing a grammar and explicitely excluding certain commands.

      Like parsing a html but not wanting "a" and "img"-tags while allowing "anchor"

      This would mean to use something like

       /<\s*\b(?~a|img)\b(.*)>/

      and this should be achievable with

      use strict; use warnings; undef $/; $_ = <DATA>; # slurp print "1: $1 - $2\n" while /<\s*(?!a\b|img\b)(\w+)\b(.*?)\s*>/g; print "2: $1 - $2\n" while /<\s*(?!a|img)(\w+)(.*?)\s*>/g; my %absent = ( a=>1, img =>1 ); while (/<\s*(\w+)(.*?)\s*>/g) { print "3: $1 - $2\n" unless $absent{$1}; } while (/<\s*(\w+)\b(??{ $absent{$1} ? '^' : '' })(.*?)\s*>/g) { print "4: $1 - $2\n"; } __DATA__ <a href='bla'> <img href='bla'> <table style=""> <anchor > <tr width=100>

      1: table - style="" 1: anchor - 1: tr - width=100 2: table - style="" 2: tr - width=100 3: table - style="" 3: anchor - 3: tr - width=100 4: table - style="" 4: anchor - 4: tr - width=100

      please note how important it is to repeat the delimiting \b in case 1, which might justify the use of (?~...) .

      Though to be sure I'd need to test the Ruby implementation, and I'm not willing to install yet.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      PS: Choosing HTML as data grammar was - as always - unfortunate. You are free to propose something different.

      The C-style comments pattern isn't really a practical example. They can be matched more easily as qr{/\*.*?\*/}, but the problem is stated in an unnatural way to exclude that.
        But non-greedy matching is known to be slow. Maybe I'll need to make some benchmarks.
Re: [OT] Thoughts on Ruby's new absent operator?
by Eily (Parson) on Mar 24, 2017 at 13:22 UTC

    Although it's fairly simple to translate the operator into perl, I don't see a way that does not involve repeating the contained expression: /A(?~BCD)E/ would match the same thing as /A(?: (?!BCD).* | (?:BCD).+ )E/x (ie: BCD is allowed to match between A and E if the string is not strictly BCD)

    This looks like an easy way to exclude some values from a match precisely and explicitly. One example I can think of is if you have emails with the pattern firstnamesurname@company.pl and want to match last names in @lastnames, but not first names in @firstnames, you could do:
    $" = |; /^(?~@firstnames)(?:@lastnames)[@]company[.]pl/;
    without having to care if one of the matching names starts like one of the ignored ones (eg: benjamin and ben). I don't see how you'd end up searching for something like that though :P

    Edit: even (?: (?!REGEX).* | (?:REGEX).+ ) can't be expected to always give the same result as (?~REGEX).
    Eg: ABC =~ /A(?~BC)BC/ would be true (because the empty string is not BC), but ABC =~ /A(?: (?!BC).* | (?:BC).+ ) )BC/x; would be false because both branches fail. (?: (?!REGEX).+ | (?:REGEX).+ )? would work I think.

    Edit2: even the latter would not be correct with (?~REGEX|) (ie, if the empty string is not allowed). So there actually is no way to turn (?~REGEX) into a perl equivalent without understanding what REGEX matches (Edit: well, I can't think of one right now at least).

Re: [OT] Thoughts on Ruby's new absent operator?
by BrowserUk (Pope) on Mar 24, 2017 at 12:14 UTC
    Any thoughts on Ruby's new absent operator in its regular expressions

    Verbosity doesn't equal clarity.

    Eg. Constrast

    1. 3.1415926535897932384626433832795
    2. three point one four one five nine two six five three five eight nine seven nine three two three eight four six two six four three three eight three two seven nine five

    Just as the symbols of Chinese or Japanese seem opaque and mysterious to most westeners, yet are as clear as day to those born in the East, so the regex nomenclature is only mysterious to those that are not familiar with it. Once you take the effort to become familiar with it, its terse economy is far more easily written and read than the nested function/method calls of this kind of alternative.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.
      what if alternative without nested function/method call?
Re: [OT] Thoughts on Ruby's new absent operator?
by LanX (Chancellor) on Mar 24, 2017 at 11:50 UTC
    Well, here "my thoughts"

    How do I post a question effectively?

    my problems:

    • First of all this should be a meditation
    • Then you are linking to another discussion instead to a definition.
    • After digging out the article I'm confronted with a lot of rubyisms I don't understand.

    Why don't you try to explain to us the advantages of this construct over negativ look (ahead|behinds) in (pseudo) Perl code?

    That's the way to initiate a decent discussion here.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      I admit spending too little time on the post, while waiting to go into a class and on a rather fickle wifi connection, so I guess I kind of deserve your reply. That said: 1) I did consider Meditations, but picked Seekers of Perl Wisdom instead; 2) I assume at least some of the monks here are fans of regular expressions and have read about the absent operator, so I need not present much introductory materials.

      I have no strong opinion yet on the feature, perhaps will only do so after finding some practical use-case for it and seeing its performance characteristics.

        Sorry but the sources I found are frustratingly obscure, and installing the latest ruby to understand a badly documented experimental feature is not an option.

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

      Was it really that hard to follow?

      "The negative look-behind essentially looks to see if the specified expression is present and then fails if so. The absence operator, however, ensures that anything that isnít the specified expression will match."

Re: [OT] Thoughts on Ruby's new absent operator?
by sundialsvc4 (Abbot) on May 17, 2017 at 15:51 UTC

    While I am generally nervous about adding to the venerable regular-expression syntax, why not wait and see if this feature gains traction in Ruby-land?   If people actually find it useful, they will vote with their fingers as they write new source code that uses it.   If that actually happens, it can be added to Perl 5/6 easily enough.

      "venerable regular-expression syntax" and "vote with their fingers" ... you are trying very hard to sound clever.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1185762]
Approved by beech
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2017-07-25 02:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I came, I saw, I ...
























    Results (363 votes). Check out past polls.