Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re^3: Best practice or cargo cult?

by demerphq (Chancellor)
on Jun 21, 2006 at 12:51 UTC ( #556643=note: print w/replies, xml ) Need Help??

in reply to Re^2: Best practice or cargo cult?
in thread Best practice or cargo cult?

\N could easily be added to blead. Ill check into it.


Replies are listed 'Best First'.
Re^4: Best practice or cargo cult?
by diotalevi (Canon) on Jun 22, 2006 at 04:17 UTC

    \N{...} is already a recognized pattern in perl5 regexp language. Pick something else and add it to your current perl using the instructions at Extending Regular Expression Syntax. It's a presentation I gave to last year. Or... co-opt \N for your own use. It isn't as if \N is so common that you'd miss it if you stole it away from perl. In my demo I redefined \w and \b to something more appropriate for my own set of common tasks.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Actually, ive been mulling adding some support for some special regnodes for a while. For instance more than once ive been asked for a proper fail regop. And actually I made that comment offhand, if I had remembered that \N{} is already used I would have suggested something else. Any recommendations?


Re^4: Best practice or cargo cult?
by diotalevi (Canon) on Jun 22, 2006 at 05:25 UTC

    In fact, here's the implementation. This works in perl5 going back to uh... early? Have your cake today. Not that I tested it. It's simple enough I just penned this and didn't bother running it.

    use Regexp::SlashN; "A B C" =~ /(\N+)/; $1 eq "A B C" or die;


    package Regexp::SlashN; use overload; sub import { overload::constant qr => \ &convert } # A simple table of definitions my %syntax = ( '\\' => '\\', N => '[^\n]', ); sub convert { my ( $re ) = @_; $re =~ s/\\([\\N])/$syntax{$1}/g; return $re; }

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Well there are two reasons why I wouldnt do it this way. The first is that doing this afaict adds a high cost to compiling regexes in the scope where the overload takes effect. The second is that special metasequences like we are discussing can be handled much more efficiently by the regex engine. So for instance a NEOL regop would be a lot more efficient both in terms of storage and execution than the ANYOF regop that [^\n] is converted to.

      The ANYOF is implemented by a bitmap lookup with flags, meaning it requires more than 32 bytes to represent, and for each character inspected requires a set of bit shifting to do the correct bitmap test. Wheras an NEOL regop would be much faster as it would essentially be a straight character inequality test. Also an NEOL regop would be just 4 bytes iirc.


        This is nothing a little conditional can't cure. From a syntax standpoint, \N is the right symbol to use since \n means "newline" and we have the practice of saying \w|\W and \s|\S. I would think you'd either want to shuffle off the unicode name or just not do the work.

        sub import { if ( $] >= 5.010 ) { # Thanks to demerphq, this is native and the overloading isn't + needed. } else { overload::constant qr => \ &convert; } }

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://556643]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2018-05-24 00:24 GMT
Find Nodes?
    Voting Booth?