http://www.perlmonks.org?node_id=961234

vagabonding electron has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have to calculate the time sums of some prolonged procedure “b” in the different departments. The start time of the procedure is the timestamp “start_b”, the end time is “end_b”. The admission timestamp in the dept is “start_a”, the departure timestamp is “end_a”. Sometimes the procedure starts before the admission in the certain dept (i.e. in another dept) and/or ends after the departure from this dept (in the next dept). As the task is to calculate the process periods per department it is declared that the start of the procedure per dept is either the start of the procedure itself (if it is later than the admission timestamp) or the admission timestamp (if it is later than the start of the procedure). The same way the end of the procedure per dept is either the end of the procedure itself (if earlier as the departure timestamp) or the departure timestamp (if earlier as the end of the procedure).
In other words I have to measure the overlapping periods of time. The measure unit of the time is hour.
My today’s solution uses three loops with several breakouts through “next” and “last”. In the real life task with > 10K rows and >100K calculated hours it took proud 15 min (compared with 50 min before using the loop breaks). However I am still not sure that I do not use microscop as a hammer... I would be very glad if you could give me a hint how this could be optimized.
Besides I am thinking about saving the time spans per department in a variable in case to re-use it in the further queries. What kind of a data structure would you suggest here?
Thanks in advance.
VE
Update: Typos corrected.
The code prints:
unit6;449 unit2;1014
Here is the code:
#!/usr/bin/perl use strict; use warnings; $|=1; print "Start: ".localtime()."\n"; use Date::Calc qw( Date_to_Time ); my %hash; my $header = <DATA>; STRING: while (<DATA>) { my ($id, $wds, $start_b_str, $end_b_str, $start_a_str, $end_a_str +) = split /;/, $_, -1; my @start_b = parse_datetime($start_b_str); my @end_b = parse_datetime($end_b_str); my @start_a = parse_datetime($start_a_str); my @end_a = parse_datetime($end_a_str); my $start_b = Date_to_Time(@start_b); my $end_b = Date_to_Time(@end_b); my $start_a = Date_to_Time(@start_a); my $end_a = Date_to_Time(@end_a); next STRING if $start_b > $end_a; next STRING if $end_b < $start_a; my @arr_b = grep { $_ % 3600 == 0 } ($start_b .. $end_b); my @arr_a = grep { $_ % 3600 == 0 } ($start_a .. $end_a); my $i=0; my $actuell=0; OUTER: for my $resp (@arr_b) + { next OUTER if $resp < $start_a; + last OUTER if $resp > $end_a; + for my $stat (@arr_a) + { next if $stat < $actuell; if ($resp == $stat) { $i++; $actuell = $resp; next OUTER; } } } $hash{$wds} += $i; } while ( my($wds, $sum_time) = each %hash) { print "$wds;$sum_time\n"; } print "End: ".localtime()."\n"; sub parse_datetime { my $string = shift; my ($date, $time) = split / /, $string, 2; my ($day, $month, $year) = split /\./, $date, 3; my ($hour, $min, $sec) = split /:/, $time, 3; my @arr = ($year, $month, $day, $hour, $min, $sec ); return @arr; }; __DATA__ ID;WDS;start_b;end_b;start_a;end_a 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;19.11.2008 09:30:00; +21.11.2008 12:47:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;07.10.2008 19:23:00; +07.10.2008 19:24:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;07.10.2008 19:24:00; +22.10.2008 10:03:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;04.10.2008 11:07:00; +04.10.2008 11:20:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;04.10.2008 11:20:00; +07.10.2008 19:23:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;22.10.2008 10:03:00; +16.11.2008 10:41:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;16.11.2008 10:41:00; +19.11.2008 09:30:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;21.11.2008 12:47:00; +21.11.2008 12:48:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;21.11.2008 12:48:00; +21.11.2008 12:57:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;15.12.2008 12:07:00; +15.12.2008 16:02:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;15.12.2008 11:16:00; +15.12.2008 12:07:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;19.11.2008 09:30:00; +21.11.2008 12:47:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;07.10.2008 19:23:00; +07.10.2008 19:24:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;07.10.2008 19:24:00; +22.10.2008 10:03:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;04.10.2008 11:07:00; +04.10.2008 11:20:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;04.10.2008 11:20:00; +07.10.2008 19:23:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;22.10.2008 10:03:00; +16.11.2008 10:41:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;16.11.2008 10:41:00; +19.11.2008 09:30:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;21.11.2008 12:47:00; +21.11.2008 12:48:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;21.11.2008 12:48:00; +21.11.2008 12:57:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;15.12.2008 12:07:00; +15.12.2008 16:02:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;15.12.2008 11:16:00; +15.12.2008 12:07:00 926;unit2;21.11.2008 20:37:00;12.12.2008 22:00:00;16.12.2008 15:18:00; +06.01.2009 13:51:00 926;unit2;21.11.2008 20:37:00;12.12.2008 22:00:00;21.11.2008 12:57:00; +15.12.2008 11:16:00 926;unit2;21.11.2008 20:37:00;12.12.2008 22:00:00;15.12.2008 16:02:00; +16.12.2008 15:18:00 926;unit2;16.12.2008 10:00:00;06.01.2009 14:30:00;16.12.2008 15:18:00; +06.01.2009 13:51:00 926;unit2;16.12.2008 10:00:00;06.01.2009 14:30:00;21.11.2008 12:57:00; +15.12.2008 11:16:00 926;unit2;16.12.2008 10:00:00;06.01.2009 14:30:00;15.12.2008 16:02:00; +16.12.2008 15:18:00 156;unit3;08.12.2008 17:00:00;19.12.2008 11:00:00;29.10.2008 11:29:00; +29.10.2008 12:57:00 156;unit3;29.10.2008 13:00:00;06.11.2008 10:00:00;29.10.2008 11:29:00; +29.10.2008 12:57:00 156;unit4;08.12.2008 17:00:00;19.12.2008 11:00:00;10.11.2008 10:50:00; +22.11.2008 14:13:00 156;unit4;08.12.2008 17:00:00;19.12.2008 11:00:00;26.11.2008 13:54:00; +06.12.2008 16:01:00 156;unit4;08.12.2008 17:00:00;19.12.2008 11:00:00;21.01.2009 11:40:00; +29.01.2009 15:30:00 156;unit4;29.10.2008 13:00:00;06.11.2008 10:00:00;10.11.2008 10:50:00; +22.11.2008 14:13:00 156;unit4;29.10.2008 13:00:00;06.11.2008 10:00:00;26.11.2008 13:54:00; +06.12.2008 16:01:00 156;unit4;29.10.2008 13:00:00;06.11.2008 10:00:00;21.01.2009 11:40:00; +29.01.2009 15:30:00 156;unit5;08.12.2008 17:00:00;19.12.2008 11:00:00;22.11.2008 14:13:00; +22.11.2008 14:17:00 156;unit5;29.10.2008 13:00:00;06.11.2008 10:00:00;22.11.2008 14:13:00; +22.11.2008 14:17:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;29.10.2008 12:57:00; +29.10.2008 16:25:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;29.10.2008 16:25:00; +09.11.2008 16:05:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;09.11.2008 16:05:00; +10.11.2008 10:50:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;06.12.2008 16:01:00; +11.12.2008 00:21:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;22.11.2008 14:17:00; +22.11.2008 17:14:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;22.11.2008 17:14:00; +26.11.2008 13:54:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;11.12.2008 00:21:00; +21.01.2009 11:40:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;29.10.2008 12:57:00; +29.10.2008 16:25:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;29.10.2008 16:25:00; +09.11.2008 16:05:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;09.11.2008 16:05:00; +10.11.2008 10:50:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;06.12.2008 16:01:00; +11.12.2008 00:21:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;22.11.2008 14:17:00; +22.11.2008 17:14:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;22.11.2008 17:14:00; +26.11.2008 13:54:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;11.12.2008 00:21:00; +21.01.2009 11:40:00

Replies are listed 'Best First'.
Re: Calculation of the time overlapping
by Anonymous Monk on Mar 25, 2012 at 14:05 UTC

    Your problem starts here

    my @arr_b = grep { $_ % 3600 == 0 } ($start_b .. $end_b); my @arr_a = grep { $_ % 3600 == 0 } ($start_a .. $end_a);

    Sure, I hate to do math too, esp the fuzzy date math, but DateTime::Span already does the math for us :) no need to generate ranges like some ruby programmer :)

    With my modification I get 5 second speedup

    $ perl -s timeoverlap.961234.pl Start: Sun Mar 25 06:53:28 2012 Start: 0 unit6;449 unit2;1014 End: Sun Mar 25 06:53:34 2012 End: 6 $ perl -s timeoverlap.961234.pl -withDateTime Start: Sun Mar 25 06:53:37 2012 Start: 1 unit6;447 unit2;1013 End: Sun Mar 25 06:53:37 2012 End: 1

    Sure my numbers don't exactly match yours but they're close enough :) maybe even more correct; here is my modification, add it to your program after split

    use vars qw/ $withDateTime /; if( $withDateTime ){ use if $withDateTime,qw/ DateTime::Span /; use if $withDateTime,qw/ DateTime::Format::Strptime /; #~ use DateTime::Span; #~ use DateTime::Format::Strptime; my $strp = DateTime::Format::Strptime->new( #~ 29.10.2008 13:00:00 #~ %d.%m.%Y %T pattern => '%d.%m.%Y %T', ); my $span_a = DateTime::Span->from_datetimes( start => $strp->parse_datetime($start_a_str), end => $strp->parse_datetime($end_a_str), ); my $span_b = DateTime::Span->from_datetimes( start => $strp->parse_datetime($start_b_str), end => $strp->parse_datetime($end_b_str), ); my $set = $span_a->intersection( $span_b ); my $dur = $set->duration; my $thours = 0; $thours += $dur ->years *365*24; $thours += $dur ->months *30*24; $thours += $dur ->weeks *7*24; $thours += $dur ->days *24; $thours += $dur ->hours ; $thours += $dur ->minutes / 60 ; $thours += $dur ->seconds / 60 / 60 ; # irrelevant $thours = sprintf '%.f', $thours; # round $hash{$wds} += $thours if $thours; next STRING; }

    And sure, DateTime::Duration is annoying and could have used some kind of  total_as_hours but it sure beats ranges :)

      Thank you very much, this sounds encouraging !
      Unfortunately my ActiveState installation bites me again.
      There is no DateTime::Span module there available, only DateTime::Span::Common. If I start with DateTime::Span::Common (just a use declaration to test) it dies "Can't locate DateTime/Span.pm in @INC ..."
      Trying to install with cpan from the command line brings an error message
      "Warning: Cannot install DataTime::span, don't know what it is. Try the command i /DataTime::span/ to find objects with matching identifiers.
      There is not DateTime::Duration module, only Datetime::Duration::Fuzzy.
      :-( Update: Sorry! There is DataTime::Duration there, sorry, I have used the false script.