http://www.perlmonks.org?node_id=295306

booter has asked for the wisdom of the Perl Monks concerning the following question:

I received such excellent feedback the last time I posted here that I thought some of you gurus and 'budding' experts might be able to lend a hand again.

Here is the problem. I need a routine that validates a numeric number as a date, with some flexibility in the format presentation, ie yyyymmdd, yyyyddmm, mmddyyyy, etc. For example, 19971103 would return true, as would 11031997, but 19983201 would be false (there is no 32 day or month), as would 32011998. So, basically the routine must be flexible enought to accept a certain date format (numberically only), and figure out if it is a valid calendar date. It seems simple enough, but implementation of my own routine has resulted in a few headaches.

Does anyone know of a routine that would accomplish this. I've looked at a few on cpan.org, but most are in string format. I'm dealing with numbers only, as in pin codes entered in a website for birthdates.....so you might see where i need to go with this.

Thanks to anyone that can help!

Replies are listed 'Best First'.
Re: Numeric Date Validation
by jdtoronto (Prior) on Sep 30, 2003 at 15:26 UTC
    My all time favourite has to be Date::Manip. It will essentially validate ANY date format and if you only want numeric input, then, hey, we have regexes!

    Much of my code is used OUTSIDE the US, so I have to handle dates and times in all sorts of formats. So far Date::Manip is the only thing I have found that will keep me sane.

    jdtoronto

Re: Numeric Date Validation
by dbwiz (Curate) on Sep 30, 2003 at 17:09 UTC

    Date::Manip can do quite a lot, but not everything. Check this test script.

    #!/usr/bin/perl -w use strict; use Date::Manip; while (<DATA>) { chomp; my $date = ParseDate($_); if ($date) { print "$_ \t=> ", UnixDate($date,"%b %e, %Y."),"\n"; } else { print "$_ \t=> not a valid date\n"; } } __DATA__ 19971103 11031997 11.03.1997 19983201 20031110 20031011 1993jan02 2002dec08 yesterday today tomorrow Aug12 12Aug2003 Aug122003 122003Aug
    output: 19971103 => Nov 3, 1997. 11031997 => not a valid date 11.03.1997 => Nov 3, 1997. 19983201 => not a valid date 20031110 => Nov 10, 2003. 20031011 => Oct 11, 2003. 1993jan02 => Jan 2, 1993. 2002dec08 => Dec 8, 2002. yesterday => Sep 29, 2003. today => Sep 30, 2003. tomorrow => Oct 1, 2003. Aug12 => Aug 12, 2003. 12Aug2003 => Aug 12, 2003. Aug122003 => Aug 12, 2003. 122003Aug => Aug 12, 2003.

    Your 11031997 is not recognized as it is, but it is parsed correctly if you add dots. Check the documentation to see which formats are recognized. You see from the example that it is quite flexible.

    Be aware, though, that Date::Manip is very slow compared to direct regexp manipulation. Check the docs for this issue as well.

    HTH

      Most interesting dbwiz! I will be keeping a copy of this for myself as I had always understood that 11031997 would return 11th March 1997 - I thought it defaulted to ddmmyyyy whereas the behaviour your example shows seem to indicate some ambiguity.

      More testing required here!

      jdtoronto

Re: Numeric Date Validation
by Not_a_Number (Prior) on Sep 30, 2003 at 19:04 UTC

    booter, forget it :-) Even treating the date as a string, you'd have problems.

    Quite apart from trying to parse something like '310919':

    1. Could it be ddmmyy? No (31 > days in Sept)

    2. Could it be mmddyy? No (31 > months in year)

    3. Could it be yyddmm? No (19 > months in year)

    4. Could it be yymmdd? Yes! (1931:Sept:19)

    treating user input as a:

    "numberic [which] cannot start with 0"

    raises so many issues that IMO it would be better to rethink your 'pin number' concept.

    How, for example, are you going to prompt for user input?:

    Enter your birthday in any format: yyyymmdd yyyyddmm yymmdd (unless your birthday is between 1900 and 1909, or sin +ce 2000) yyddmm (unless your birthday is between 1900 and 1909, or sin +ce 2000) ddmmyyyy (unless your birthday is before the 10th of the mont +h) mmddyyyy (unless your birthday is between January and Septemb +er) ddmmyy (unless your birthday is before the 10th of the month) mmddyy (unless your birthday is between January and September +)

    How to explain that if the user's birthday is '030303', they have no choice but to enter '19030303' (if they're very old) or '20030303' (if they're very young)??

    There must be a better way to do it!

    dave

Re: Numeric Date Validation
by svsingh (Priest) on Sep 30, 2003 at 15:18 UTC
    Here are some initial questions I had:
    • Will you be able to know the date format before parsing it? (Get the format from the user or a config file?) Or do you want to determine the format from the number?
    • Is there an expected range of valid dates? If it can be anything, then how do you want to handle cases where a single eight-digit number can be parsed into more than one valid date?
      Hi,

      The date format can be one of the following

      yyyymmdd
      yyyyddmm
      yymmdd
      yyddmm
      ddmmyyyy
      mmddyyyy
      ddmmyy
      mmddyy


      In terms of range, the date can fall anywhere beween the present date (ie. 20030930, or is that 030930....see the difficulty?), and the most earliest date accepted, which is Jan 1st, 1900, so 19000101. Since a numberic cannot start with 0, this date cannot be represented by droping the century indicator (ie 19xx), so 000101 is not accepted.

      This is a bit difficult to implement as a numberic. Let me know if you have any ideas. Thanks for your feedback.

        20030102 -- January 2nd, or February 1st? Does it matter for your application?
Re: Numeric Date Validation
by qq (Hermit) on Sep 30, 2003 at 23:22 UTC

    I agree that the requirements seem slightly misguided, but here is an attempt anyway. The approach is to make two subs, one that validates a year, month and date, and one that splits a number into all possible year,month and date combos and to pass to the first.

    Its late, I'm tired, so this could definitely be improved, but its a start.

    #!/usr/bin/perl -w use strict; =pod yyyymmdd yyyyddmm ddmmyyyy mmddyyyy yymmdd yyddmm ddmmyy mmddyy =cut my %months = ( 1=>31, 2=>28, 3=>31, 4=>30, 5=>31, 6=>30, 7=>31, 8=>31, 9=>31, 10=>31, 11=>30, 12=>31 ); for (<DATA> ) { chomp; print "$_ "; print validate( $_ ) ? 'yep' : 'nope'; print "\n"; } sub validate_ymd { my ($y,$m,$d) = @_; # last minute addition / hack to make year 4 digit if ( length( $y ) == 2 ) { if ( $y > 3 ) { $y += 1900; } else { # close readers will see that these are equivalent... return 1 if validate_ymd( 1900 + $y, $m, $d ); return 1 if validate_ymd( 2000 + $y, $m, $d ); return 0; } } $y = int($y); $m=int($m); $d=int($d); return 0 if $y > 2003; return 0 if $y < 1900; my $leap = 0; if ( $m == 2 and !$y or !($y % 4) ) { $leap = 1; } return 0 unless exists $months{$m}; return 0 unless $d and $d <= $months{$m} + $leap; print " (y $y m $m d $d) "; return 1; } sub validate { my $date = shift; return 0 if $date =~ /\D/; my $length = length( $date ); return 0 unless $length == 6 or $length == 8; my @attempts; if ( $length == 6 ) { $date =~ /(..)(..)(..)/; push @attempts, [$1,$2,$3], [$1,$3,$2], [$3,$1,$2], [$3,$2,$1]; } elsif ( $length == 8 ) { $date =~ /(....)(..)(..)/; push @attempts, [$1,$2,$3], [$1,$3,$2]; $date =~ /(..)(..)(....)/; push @attempts, [$3,$1,$2], [$3,$2,$1]; } foreach ( @attempts ) { return 1 if validate_ymd( @$_ ); } return 0; } __DATA__ 20000229 19970229 29020300 291254 002902 19990223 101172

    gives:

    20000229 (y 2000 m 2 d 29) yep 19970229 nope 29020300 nope 291254 (y 1954 m 12 d 29) yep 002902 (y 1900 m 2 d 29) yep 19990223 (y 1999 m 2 d 23) yep 101172 (y 1972 m 10 d 11) yep
Re: Numeric Date Validation
by Anonymous Monk on Sep 30, 2003 at 21:53 UTC
    Almost certain DateTime will handle this - its the dog's danglies!

    http://datetime.perl.org/modules.html

Re: Numeric Date Validation
by johndageek (Hermit) on Oct 02, 2003 at 21:46 UTC
    Interesting problem that has hit a great many of us over time. The question is - Is the data unique enough to eliminate "bad data".

    valid formats per your post

    yyyymmdd
    yyyyddmm
    yymmdd
    yyddmm
    ddmmyyyy
    mmddyyyy
    ddmmyy
    mmddyy

    date range Jan 01 1900 through Jan 01 2005 (assume this will be used for a while). What data can we eliminate as invalid.

    take data in couplets.
    if all columns are between 01 and 12 there is no identifying the data as month day or year (obviously a 6 digit date)
    if a column contains between 32 and 99 or 00 it is the year - and if date is 8 digits long preceding 2 digits are century.
    Having identified the year column, any other column containing 13-31 are days.

    So having wandered about a bit, my conclusion IMHO is that this type of "edit" returns no value in a large percent of the time because while a format may be found as having violated none of the rules, we still have no idea what date was intended.

    Valid and usable data are two different animals, just because the data passes all the tests we can think of, doesn't mean it is fit for the use we want to put it to.

    Good luck!
    dageek