Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Parse ISO 8601 date/times (never \d)

by tye (Cardinal)
on Nov 07, 2012 at 14:46 UTC ( #1002681=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Parse ISO 8601 date/times
in thread Parse ISO 8601 date/times

Yeah, I just don't ever use \d except for one-liners any more. \d now means something that I just never want: numerals of any kind, from any writing system. This despite Perl only knowing how to treat one of the two dozenish types of numerals as numeric. I think drastically changing the definition of \d when Unicode came along was a mistake (a separate way of saying "any numeral" should have been used).

Luckily, the somewhat longer [0-9] has some visual advantages. So the worst problem is all of the old scripts that are now broken in ways that will often not matter (but that I can see even causing security problems in rare cases).

- tye        


Comment on Re^3: Parse ISO 8601 date/times (never \d)
Download Code
Replies are listed 'Best First'.
Re^4: Parse ISO 8601 date/times (never \d)
by tobyink (Abbot) on Nov 07, 2012 at 17:46 UTC

    The following pragma will "fix" \d. However, re::engine::Plugin does not currently support s/// or split //, just matching. (And it doesn't support named captures either.) Still, it may be helpful for some.

    use 5.010; use strict; use utf8::all; BEGIN { package re::engine::SaneDigits; no thanks; use constant TAINT => ${^TAINT}; use re::engine::Plugin (); use Carp; sub import { re::engine::Plugin->import( comp => \&comp, exec => \&exec, ); } *unimport = \&re::engine::Plugin::unimport; sub comp { my ($rx) = @_; my $real = $rx->pattern; $real =~ s{\\d}{[0-9]}g; $real =~ s{\\D}{[^0-9]}g; my %mods = my %mod = $rx->mod; my $mods = join q(), keys %mods; $real =~ s{/}{\/}g; $real = eval qq{ qr/$real/$mods }; $rx->stash({ real => $real }); $rx->num_captures( FETCH => sub { my ($rx, $paren) = @_; croak sprintf( "%s variable not supported with %s", { 0 => q($&), -1 => q($'), -2 => q($`) }->{$paren} +, __PACKAGE__, ) if $paren < 1; my $rv = $rx->stash->{last}[$paren]; return $rv unless TAINT; $rv =~ /(.*)/; return $1; }, ); } sub exec { my ($rx, $str) = @_; my @results = ($str =~ $rx->stash->{real}); unshift @results, scalar pos; $rx->stash->{last} = \@results; return not defined $results[0]; } }; my $str = "foo23 bar5 bar42"; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH"; use re::engine::SaneDigits; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH";

    Update: Meh... come to think of it, a re::engine is overkill. Constant overloading does the trick much easier...

    use 5.010; use strict; use utf8::all; BEGIN { package re::SaneDigits; no thanks; use overload (); my %_const_handlers = (qr => \&_qr); my %_remove_handlers = map { $_ => undef } %_const_handlers; sub import { overload::constant %_const_handlers } sub unimport { overload::remove_constant %_remove_handlers } sub _qr { for (@_) { s/\\d/[0-9]/g; s/\\D/[^0-9]/g; return $_; } } }; my $str = "foo23 bar5 bar42"; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH"; use re::SaneDigits; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH";

    Another CPAN candidate I think.

    Update II: Looks like PerlMonks might be breaking my UTF8 again. The "5" character which appears in $str should not be a normal ASCII 5, but a fullwidth 5 (U+U+FF15), which is a character used to include an Arabic numeral 5 within CJK text.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      Wow, that's a lot of machinery to avoid running :s/\\d/[0-9]/gc in your editor. I think I'll always choose to avoid the recurring cost.

      And I'm mostly not talking about CPU cost (but I suspect that is non-trivial), but the cost of things like mentally having to track new rules about how changing m// to s/// or split() requires extra attention, having to track that \d means different things in different places, having to search for pragmas each time I see \d inside m// if I care which meaning it has, the risk of having to debug the chain of code required to support this, etc.

      The risk of just wasting time because of a bug in the added pile of code required to support this is my biggest concern (after having repeatedly been burned by such things), especially when I consider the risk of this idea of pretending \d isn't \d getting in the way of some other tricky module's reasonable-sounding assumptions.

      Just because something is possible doesn't mean it is a good idea. :)

      - tye        

        The second version, which uses overloading, should work with s/// and split //.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002681]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (14)
As of 2015-07-07 20:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls