http://www.perlmonks.org?node_id=990243

agaved has asked for the wisdom of the Perl Monks concerning the following question:

Hi all I want to check if a date is in the form '26-04' or '26-04-12' (both will do).

The divider can be '.' and '_', too.

I used as regexp /\d{2}\-._\d{2}(\-._\d{2})*/ but it doesn't match ... I cannot understand why.

I guess it will be pretty obvious in hindsight, but I have been banging my head on the wall for hours and couldn't get through.

Any help much appreciated.

Replies are listed 'Best First'.
Re: Regexp problem
by choroba (Cardinal) on Aug 28, 2012 at 13:29 UTC
    Please, enclose the the regex in <code> ... </code> tags to be readable.
    Also, please specify what strings the regex does not match but should - it works for me:
    perl -E ' $R = qr/\d{2}[-._]\d{2}([-._]\d{2})*/; say "$_ ", /$R/ ? "Y" : "N" for qw/1-2 1-20 11.20-1 11.20 12_30.99 + 000_00_000/'
    Maybe you miss the anchors? Put ^ at the beginning and $ at the end of the regex. Also using ? instead of * might be desirable not to match strings like 12-12-12-12-12-12-12.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Regexp problem
by philiprbrenan (Monk) on Aug 28, 2012 at 13:29 UTC

    Please use [] to separate the separator choices. Please consider using \A and \Z to test the entire input string. I think you meant ? rather than * for the final optional section? '?' means optionally, while * means zero or more.

    use feature ":5.14"; use warnings FATAL => qw(all); use strict; use Data::Dump qw(dump); my @d = qw(26-04 26-04-12 26.04 26.04.12 30/8 30/08 30.08/12 1 12 aa a +a/bb aa.c help); say (/\A\d{2}[.-]\d{2}([.-]\d{2})?\Z/ ? "Matches for $_" : "FAILS for +$_" ) for @d;

    Produces:

    Matches for 26-04
    Matches for 26-04-12
    Matches for 26.04
    Matches for 26.04.12
    FAILS for 30/8
    FAILS for 30/08
    FAILS for 30.08/12
    FAILS for 1
    FAILS for 12
    FAILS for aa
    FAILS for aa/bb
    FAILS for aa.c
    FAILS for help
    
      "Please consider using \A and \Z to test the entire input string."

      I agree with this in principle; however, there's a subtle difference between \Z (uppercase) and \z (lowercase).

      • /\A ... \z - matches the entire input string.
      • /\A ... \Z - matches the entire input string except for a terminal newline, if it exists.

      Here's a couple of one-liners to demonstrate this:

      $ perl -E 'my $x = qq{qwerty\n}; $re = qr{\Aqwerty\Z}; say +($x =~ /$r +e/) ? 1 : 0;' 1 $ perl -E 'my $x = qq{qwerty\n}; $re = qr{\Aqwerty\z}; say +($x =~ /$r +e/) ? 1 : 0;' 0

      See Assertions under perlre - Regular Expressions which has:

      "To match the actual end of the string and not ignore an optional trailing newline, use \z."

      -- Ken

Re: Regexp problem
by jethro (Monsignor) on Aug 28, 2012 at 13:35 UTC
    perl -e 'if ("26-04"=~/\d{2}[\-._]\d{2}([\-._]\d{2})*/) { print "yes\n +" }' #prints yes

    Seems to work. No need to escape the '-' as it would not be used as range character when it is the first character in a character class

    PS: Use <c>-tags around your regex so that it gets displayed correctly in the browser

Re: Regexp problem
by agaved (Novice) on Aug 28, 2012 at 20:15 UTC
    Thanks for the quick answers

    I was watching this on the debugger and I think I misinterpreted it.

    So now my question is: why in the debugger I get

    x "24-06" =~ /\d{2}-\d{2}/ as 1,

    but x "24-06" =~ /\d{2}-\d{2}(p)*/ as undef?

      As far as I understand the debugger, both expressions are evaluated in list context.

      The first expression simply returns 1 for success and I assume 0 an empty list for failure; so it's a single boolean result in list context.

      The second expression has capturing parantheses, which (in list context) produces a list of captured results. As your (p)* could not be matched, there is no captured value for (p), so it returns a list with one element: undef.

      That's my attempt to explain it. I am sure there are others who can explain it more detailed and accurately and even show some insight into the internals ...

      DB<4> x "24-06" =~ /\d+-\d+/ 0 1 DB<5> x "24-06" =~ /\d+-\d+failure/ empty array

      edit:

      • fixed assumption upon failed regex match, added code example, some rephrasing