Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Parse ISO 8601 date/times

by grizzley (Chaplain)
on Nov 07, 2012 at 09:41 UTC ( #1002648=note: print w/ replies, xml ) Need Help??


in reply to Parse ISO 8601 date/times

\d is equivalent of [0-9] Update: Stupid UTF or other sh*t in choroba's reply below ruined my brilliant hint. Must search for my XP points somewhere else :P


Comment on Re: Parse ISO 8601 date/times
Select or Download Code
Replies are listed 'Best First'.
Re^2: Parse ISO 8601 date/times
by roboticus (Chancellor) on Nov 07, 2012 at 10:41 UTC

    grizzley:

    I'm guessing he used [0-9] for visual symmetry with [0-2], [0-3], et. al. I was going to suggest that it would be easier to read, but when I converted a little bit from this:

    && $part !~ m{ # Time or partial time (or period): ^(?:|P)T [012][0-9] (?:| :?[0-5][0-9] (?:| :?[0-5][0-9] ) ) (?:| [.,][0-9]+ )$ }x

    to this:

    && $part !~ m{ # Time or partial time (or period): ^(?:|P)T [012]\d (?:| :?[0-5]\d (?:| :?[0-5]\d ) ) (?:| [.,]\d+ )$ }x

    I found that the better 'visual balance' of [0-9] was counterbalanced by the square brackets, which are a little too similar to vertical bars for my eyes. After looking at them both, I don't really have a preference--Perhaps if I had a better font...

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I agree. [0-9] is visually better in this case.
Re^2: Parse ISO 8601 date/times
by choroba (Canon) on Nov 07, 2012 at 11:59 UTC
    Oh really?
    $d = chr(2413); print $d =~ $_, "\n" for qr/\d/, qr/[0-9]/;
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Yeah, I just don't ever use \d except for one-liners any more. \d now means something that I just never want: numerals of any kind, from any writing system. This despite Perl only knowing how to treat one of the two dozenish types of numerals as numeric. I think drastically changing the definition of \d when Unicode came along was a mistake (a separate way of saying "any numeral" should have been used).

      Luckily, the somewhat longer [0-9] has some visual advantages. So the worst problem is all of the old scripts that are now broken in ways that will often not matter (but that I can see even causing security problems in rare cases).

      - tye        

        The following pragma will "fix" \d. However, re::engine::Plugin does not currently support s/// or split //, just matching. (And it doesn't support named captures either.) Still, it may be helpful for some.

        use 5.010; use strict; use utf8::all; BEGIN { package re::engine::SaneDigits; no thanks; use constant TAINT => ${^TAINT}; use re::engine::Plugin (); use Carp; sub import { re::engine::Plugin->import( comp => \&comp, exec => \&exec, ); } *unimport = \&re::engine::Plugin::unimport; sub comp { my ($rx) = @_; my $real = $rx->pattern; $real =~ s{\\d}{[0-9]}g; $real =~ s{\\D}{[^0-9]}g; my %mods = my %mod = $rx->mod; my $mods = join q(), keys %mods; $real =~ s{/}{\/}g; $real = eval qq{ qr/$real/$mods }; $rx->stash({ real => $real }); $rx->num_captures( FETCH => sub { my ($rx, $paren) = @_; croak sprintf( "%s variable not supported with %s", { 0 => q($&), -1 => q($'), -2 => q($`) }->{$paren} +, __PACKAGE__, ) if $paren < 1; my $rv = $rx->stash->{last}[$paren]; return $rv unless TAINT; $rv =~ /(.*)/; return $1; }, ); } sub exec { my ($rx, $str) = @_; my @results = ($str =~ $rx->stash->{real}); unshift @results, scalar pos; $rx->stash->{last} = \@results; return not defined $results[0]; } }; my $str = "foo23 bar5 bar42"; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH"; use re::engine::SaneDigits; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH";

        Update: Meh... come to think of it, a re::engine is overkill. Constant overloading does the trick much easier...

        use 5.010; use strict; use utf8::all; BEGIN { package re::SaneDigits; no thanks; use overload (); my %_const_handlers = (qr => \&_qr); my %_remove_handlers = map { $_ => undef } %_const_handlers; sub import { overload::constant %_const_handlers } sub unimport { overload::remove_constant %_remove_handlers } sub _qr { for (@_) { s/\\d/[0-9]/g; s/\\D/[^0-9]/g; return $_; } } }; my $str = "foo23 bar5 bar42"; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH"; use re::SaneDigits; say $str =~ m/bar(\d+)/i ? "GOT $1" : "NO MATCH";

        Another CPAN candidate I think.

        Update II: Looks like PerlMonks might be breaking my UTF8 again. The "5" character which appears in $str should not be a normal ASCII 5, but a fullwidth 5 (U+U+FF15), which is a character used to include an Arabic numeral 5 within CJK text.

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002648]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (19)
As of 2015-07-31 21:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (282 votes), past polls