http://www.perlmonks.org?node_id=1002698


in reply to Re^3: Parse ISO 8601 date/times (never \d)
in thread Parse ISO 8601 date/times

The following pragma will "fix" \d. However, re::engine::Plugin does not currently support s/// or split //, just matching. (And it doesn't support named captures either.) Still, it may be helpful for some.

use 5.010; use strict; use utf8::all; BEGIN { package re::engine::SaneDigits; no thanks; use constant TAINT => ${^TAINT}; use re::engine::Plugin (); use Carp; sub import { re::engine::Plugin->import( comp => \&comp, exec => \&exec, ); } *unimport = \&re::engine::Plugin::unimport; sub comp { my ($rx) = @_; my $real = $rx->pattern; $real =~ s{\\d}{[0-9]}g; $real =~ s{\\D}{[^0-9]}g; my %mods = my %mod = $rx->mod; my $mods = join q(), keys %mods; $real =~ s{/}{\/}g; $real = eval qq{ qr/$real/$mods }; $rx->stash({ real => $real }); $rx->num_captures( FETCH => sub { my ($rx, $paren) = @_; croak sprintf( "%s variable not supported with %s", { 0 => q($&), -1 => q($'), -2 => q($`) }->{$paren} +, __PACKAGE__, ) if $paren < 1; my $rv = $rx->stash->{last}[$paren]; return $rv unless TAINT; $rv =~ /(.*)/; return $1; }, ); } sub exec { my ($rx, $str) = @_; my @results = ($str =~ $rx->stash->{real}); unshift @results, scalar pos; $rx->stash->{last} = \@results; return not defined $results[0]; } }; my $str = "foo23 bar5 bar42"; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH"; use re::engine::SaneDigits; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH";

Update: Meh... come to think of it, a re::engine is overkill. Constant overloading does the trick much easier...

use 5.010; use strict; use utf8::all; BEGIN { package re::SaneDigits; no thanks; use overload (); my %_const_handlers = (qr => \&_qr); my %_remove_handlers = map { $_ => undef } %_const_handlers; sub import { overload::constant %_const_handlers } sub unimport { overload::remove_constant %_remove_handlers } sub _qr { for (@_) { s/\\d/[0-9]/g; s/\\D/[^0-9]/g; return $_; } } }; my $str = "foo23 bar5 bar42"; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH"; use re::SaneDigits; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH";

Another CPAN candidate I think.

Update II: Looks like PerlMonks might be breaking my UTF8 again. The "5" character which appears in $str should not be a normal ASCII 5, but a fullwidth 5 (U+U+FF15), which is a character used to include an Arabic numeral 5 within CJK text.

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
Re^5: Parse ISO 8601 date/times (still never \d)
by tye (Sage) on Nov 07, 2012 at 18:08 UTC

    Wow, that's a lot of machinery to avoid running :s/\\d/[0-9]/gc in your editor. I think I'll always choose to avoid the recurring cost.

    And I'm mostly not talking about CPU cost (but I suspect that is non-trivial), but the cost of things like mentally having to track new rules about how changing m// to s/// or split() requires extra attention, having to track that \d means different things in different places, having to search for pragmas each time I see \d inside m// if I care which meaning it has, the risk of having to debug the chain of code required to support this, etc.

    The risk of just wasting time because of a bug in the added pile of code required to support this is my biggest concern (after having repeatedly been burned by such things), especially when I consider the risk of this idea of pretending \d isn't \d getting in the way of some other tricky module's reasonable-sounding assumptions.

    Just because something is possible doesn't mean it is a good idea. :)

    - tye        

      The second version, which uses overloading, should work with s/// and split //.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'