Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Regexp problem

by agaved (Novice)
on Aug 28, 2012 at 13:18 UTC ( #990243=perlquestion: print w/ replies, xml ) Need Help??
agaved has asked for the wisdom of the Perl Monks concerning the following question:

Hi all I want to check if a date is in the form '26-04' or '26-04-12' (both will do).

The divider can be '.' and '_', too.

I used as regexp /\d{2}\-._\d{2}(\-._\d{2})*/ but it doesn't match ... I cannot understand why.

I guess it will be pretty obvious in hindsight, but I have been banging my head on the wall for hours and couldn't get through.

Any help much appreciated.

Comment on Regexp problem
Re: Regexp problem
by choroba (Abbot) on Aug 28, 2012 at 13:29 UTC
    Please, enclose the the regex in <code> ... </code> tags to be readable.
    Also, please specify what strings the regex does not match but should - it works for me:
    perl -E ' $R = qr/\d{2}[-._]\d{2}([-._]\d{2})*/; say "$_ ", /$R/ ? "Y" : "N" for qw/1-2 1-20 11.20-1 11.20 12_30.99 + 000_00_000/'
    Maybe you miss the anchors? Put ^ at the beginning and $ at the end of the regex. Also using ? instead of * might be desirable not to match strings like 12-12-12-12-12-12-12.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Regexp problem
by philiprbrenan (Monk) on Aug 28, 2012 at 13:29 UTC

    Please use [] to separate the separator choices. Please consider using \A and \Z to test the entire input string. I think you meant ? rather than * for the final optional section? '?' means optionally, while * means zero or more.

    use feature ":5.14"; use warnings FATAL => qw(all); use strict; use Data::Dump qw(dump); my @d = qw(26-04 26-04-12 26.04 26.04.12 30/8 30/08 30.08/12 1 12 aa a +a/bb aa.c help); say (/\A\d{2}[.-]\d{2}([.-]\d{2})?\Z/ ? "Matches for $_" : "FAILS for +$_" ) for @d;

    Produces:

    Matches for 26-04
    Matches for 26-04-12
    Matches for 26.04
    Matches for 26.04.12
    FAILS for 30/8
    FAILS for 30/08
    FAILS for 30.08/12
    FAILS for 1
    FAILS for 12
    FAILS for aa
    FAILS for aa/bb
    FAILS for aa.c
    FAILS for help
    
      "Please consider using \A and \Z to test the entire input string."

      I agree with this in principle; however, there's a subtle difference between \Z (uppercase) and \z (lowercase).

      • /\A ... \z - matches the entire input string.
      • /\A ... \Z - matches the entire input string except for a terminal newline, if it exists.

      Here's a couple of one-liners to demonstrate this:

      $ perl -E 'my $x = qq{qwerty\n}; $re = qr{\Aqwerty\Z}; say +($x =~ /$r +e/) ? 1 : 0;' 1 $ perl -E 'my $x = qq{qwerty\n}; $re = qr{\Aqwerty\z}; say +($x =~ /$r +e/) ? 1 : 0;' 0

      See Assertions under perlre - Regular Expressions which has:

      "To match the actual end of the string and not ignore an optional trailing newline, use \z."

      -- Ken

Re: Regexp problem
by jethro (Monsignor) on Aug 28, 2012 at 13:35 UTC
    perl -e 'if ("26-04"=~/\d{2}[\-._]\d{2}([\-._]\d{2})*/) { print "yes\n +" }' #prints yes

    Seems to work. No need to escape the '-' as it would not be used as range character when it is the first character in a character class

    PS: Use <c>-tags around your regex so that it gets displayed correctly in the browser

Re: Regexp problem
by agaved (Novice) on Aug 28, 2012 at 20:15 UTC
    Thanks for the quick answers

    I was watching this on the debugger and I think I misinterpreted it.

    So now my question is: why in the debugger I get

    x "24-06" =~ /\d{2}-\d{2}/   as 1,

    but x "24-06" =~ /\d{2}-\d{2}(p)*/   as undef?

      As far as I understand the debugger, both expressions are evaluated in list context.

      The first expression simply returns 1 for success and I assume 0 an empty list for failure; so it's a single boolean result in list context.

      The second expression has capturing parantheses, which (in list context) produces a list of captured results. As your (p)* could not be matched, there is no captured value for (p), so it returns a list with one element: undef.

      That's my attempt to explain it. I am sure there are others who can explain it more detailed and accurately and even show some insight into the internals ...

      DB<4> x "24-06" =~ /\d+-\d+/ 0 1 DB<5> x "24-06" =~ /\d+-\d+failure/ empty array

      edit:

      • fixed assumption upon failed regex match, added code example, some rephrasing

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://990243]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (15)
As of 2014-07-28 20:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (207 votes), past polls