http://www.perlmonks.org?node_id=153702

mAsterdam has asked for the wisdom of the Perl Monks concerning the following question:

Honourable Monks, This challenge came from two collegues who know I am trying to learn the power of regular expression by applying them anywhere I can. They had to validate a date in a string with only a regular expression (and one only), due to environmental/tool contraints. They would be happy if it did not check for long months let alone leapdays, but any sanity check would be welcome. You can't do calculations with regexes, or can you?

Maybe this has been done before, or is even folklore, but to me it was completely new. When I remembered one could lexically establish divisibility by 4, 100 and 400 I started expanding the regex. Now it works. Unfortunately the 'tool' my collegues have to use (teamsite) does not (or they could not find it) support the x-modifier so I had to make one very long line-noise-like string. It too works, though; so they are happy too.

Please have a look at this code and tell me if I am on a good track. I was quite happy with the result, and this is my first message here so please don 't judge me to harsh.

#!/usr/bin/perl -w #============validdate.pl=================================== # Checks date dd/mm/yyyy # 4/3/2002 danny@vrijdag.xs4all.nl print "Check dates of the form dd/mm/yyyy\nq quits.\n-:"; &test; # just a wrapper: sub test { while (<>) { last if m/(?:^q|bye|end|exit|quit|stop)/i; chop; my $val= &validate($_); print " '$_': $val\n-:"; } print $_."\n"; } # The core: sub validate { my ($in)=@_; return "No dd/mm/yyyy" unless m ! ^(?: # the start (?: # A: not leapsensitive or not leap: (?:0?[1-9]|1\d|2[0-8]) # A1: dd (0)1 - 28 / # separator (or [\.- ]) (?:0?[1-9]|1[0-2]) # mm (0)1 - 12 (any month) | # or A2: (?:29|30) # dd 29 or 30 / # separator (?:0?[13-9]|1[0-2]) # mm (0)1,3-12: any month but 2 | # or A3: 31 # 31 / (?:0?[13578]|1[02]) # a long mm: 1,3,5,7,8,10,12 ) # end dd/mm non-leap / \d{4} # 4 digits: any yyyy goes in A. | # B: leapday in a leapyear: 29/0?2/ # that is 29/(0)2 february (?: # B1: \d\d # any century and (?: # divisable 0[48] # by 4 but not by 100 | # (so 04, 08 but not 00) [2468][048] # even tens:20,24,28,40.. | [13579][26] # odd tens: 12, 16, 32, 36, 52.. ) | # B2: (?: # yyyy divisible by 400 [02468][048] # even mill: 00xx, 04xx, 08xx.. | [13579][26] # odd mill: 12xx, 16xx, 32xx.. )00 # so divisible by 400 ) # end leapday ) # end day $!x; # nothing else, nothing more return "Ok"; }
Any improving comment is appreciated. Structure, working, style, readability, maybe there is some nice "any digit but 2"-syntax I am unaware of - you name it. I know nobody who is into regular expressions. TIA, Danny

Replies are listed 'Best First'.
(Ovid) Re: Valid date in regex only
by Ovid (Cardinal) on Mar 23, 2002 at 02:33 UTC

    mAsterdam wrote:

    They had to validate a date in a string with only a regular expression (and one only), due to environmental/tool contraints.

    With all due respect, it's difficult to know how to answer this. What is their tool? Perl regular expressions tend to be much more expressive than most other language's regexen, so a direct conversion from a Perl regex to another language's regex may not be possible. If you just want comments on the Perl aspect, well, what you are doing is impressive, but it's the wrong tool for the job. If you want to know how you can use this for the other language, we can't answer that. Heck, for all I know, the other tool could use a DFA engine instead of an NFA engine, thus making a direct comparison very problematic.

    Incidentally, judging from your email address and your username, I couldn't help but wonder if you have any association with the Hippies from Hell. If so, tell Willem Hengveld that "Curtis" says "hi" :) (I used to work with him)

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Ovid wrote:

      <quote>    What is their tool?</quote>

      The tool is called "teamsite". According to my collegues the manual states nothing much about the exact regex-implementation but that it uses "perl-like regular expressions" (no perl-version).

      And:

      <quote>    ...direct conversion from a Perl regex to another language's regex may not be possible</quote>

      changing the outer delimiters and removing comments, spaces, '?:' and the x-modifier did the job for teamsite, leaving (as said) an unreadable string of hooks, parens, hyphens and digits. something like (I don't have the final string here):

      validation-regex="^(((0?[1-9]|1\d|2[0-8])/(0?[1-9]|1[0-2])(29|30)/(0?[ +13-9]|1[0-2])31/(0?[13578]|1[02]))/\d{4}|29/0?2/(\d\d(0[48]|[2468][04 +8]|[13579][26])|([02468][048]|[13579][26])00))$"

      But it worked :-)

      Thanks for the prompt response.
      I don't know Willem (or the Hippies you mentioned) but if I ever meet him I'll say hi from Curtis.

      Regards,

      Danny

•Re: Valid date in regex only
by merlyn (Sage) on Mar 23, 2002 at 02:25 UTC
    This gets today's "using a screwdriver for a hammer" award from me.

    Please don't do this in public. It's embarassing. Wrong tool. Really wrong tool.

    -- Randal L. Schwartz, Perl hacker

      Well, you are absolutely right. It is the wrong tool for the job. I was aware of that. I knew that it is the wrong tool and that is why I described the circumstances. The difference you make by restating this without acknowledging my awarenes of that fact is I now do feel embarassed. I have used gold-plated screws as ordinairy nails. There were no nails available. But I should have kept it a secret? Oh, well. I still like your other writings.

      regards,

      Danny