vagabonding electron has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
I have to calculate the time sums of some prolonged procedure “b” in the different departments. The start time of the procedure is the timestamp “start_b”, the end time is “end_b”. The admission timestamp in the dept is “start_a”, the departure timestamp is “end_a”. Sometimes the procedure starts before the admission in the certain dept (i.e. in another dept) and/or ends after the departure from this dept (in the next dept). As the task is to calculate the process periods per department it is declared that the start of the procedure per dept is either the start of the procedure itself (if it is later than the admission timestamp) or the admission timestamp (if it is later than the start of the procedure). The same way the end of the procedure per dept is either the end of the procedure itself (if earlier as the departure timestamp) or the departure timestamp (if earlier as the end of the procedure).
In other words I have to measure the overlapping periods of time. The measure unit of the time is hour.
My today’s solution uses three loops with several breakouts through “next” and “last”. In the real life task with > 10K rows and >100K calculated hours it took proud 15 min (compared with 50 min before using the loop breaks). However I am still not sure that I do not use microscop as a hammer... I would be very glad if you could give me a hint how this could be optimized.
Besides I am thinking about saving the time spans per department in a variable in case to re-use it in the further queries. What kind of a data structure would you suggest here?
Thanks in advance.
VE
Update: Typos corrected.
The code prints:
I have to calculate the time sums of some prolonged procedure “b” in the different departments. The start time of the procedure is the timestamp “start_b”, the end time is “end_b”. The admission timestamp in the dept is “start_a”, the departure timestamp is “end_a”. Sometimes the procedure starts before the admission in the certain dept (i.e. in another dept) and/or ends after the departure from this dept (in the next dept). As the task is to calculate the process periods per department it is declared that the start of the procedure per dept is either the start of the procedure itself (if it is later than the admission timestamp) or the admission timestamp (if it is later than the start of the procedure). The same way the end of the procedure per dept is either the end of the procedure itself (if earlier as the departure timestamp) or the departure timestamp (if earlier as the end of the procedure).
In other words I have to measure the overlapping periods of time. The measure unit of the time is hour.
My today’s solution uses three loops with several breakouts through “next” and “last”. In the real life task with > 10K rows and >100K calculated hours it took proud 15 min (compared with 50 min before using the loop breaks). However I am still not sure that I do not use microscop as a hammer... I would be very glad if you could give me a hint how this could be optimized.
Besides I am thinking about saving the time spans per department in a variable in case to re-use it in the further queries. What kind of a data structure would you suggest here?
Thanks in advance.
VE
Update: Typos corrected.
The code prints:
Here is the code:unit6;449 unit2;1014
#!/usr/bin/perl use strict; use warnings; $|=1; print "Start: ".localtime()."\n"; use Date::Calc qw( Date_to_Time ); my %hash; my $header = <DATA>; STRING: while (<DATA>) { my ($id, $wds, $start_b_str, $end_b_str, $start_a_str, $end_a_str +) = split /;/, $_, -1; my @start_b = parse_datetime($start_b_str); my @end_b = parse_datetime($end_b_str); my @start_a = parse_datetime($start_a_str); my @end_a = parse_datetime($end_a_str); my $start_b = Date_to_Time(@start_b); my $end_b = Date_to_Time(@end_b); my $start_a = Date_to_Time(@start_a); my $end_a = Date_to_Time(@end_a); next STRING if $start_b > $end_a; next STRING if $end_b < $start_a; my @arr_b = grep { $_ % 3600 == 0 } ($start_b .. $end_b); my @arr_a = grep { $_ % 3600 == 0 } ($start_a .. $end_a); my $i=0; my $actuell=0; OUTER: for my $resp (@arr_b) + { next OUTER if $resp < $start_a; + last OUTER if $resp > $end_a; + for my $stat (@arr_a) + { next if $stat < $actuell; if ($resp == $stat) { $i++; $actuell = $resp; next OUTER; } } } $hash{$wds} += $i; } while ( my($wds, $sum_time) = each %hash) { print "$wds;$sum_time\n"; } print "End: ".localtime()."\n"; sub parse_datetime { my $string = shift; my ($date, $time) = split / /, $string, 2; my ($day, $month, $year) = split /\./, $date, 3; my ($hour, $min, $sec) = split /:/, $time, 3; my @arr = ($year, $month, $day, $hour, $min, $sec ); return @arr; }; __DATA__ ID;WDS;start_b;end_b;start_a;end_a 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;19.11.2008 09:30:00; +21.11.2008 12:47:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;07.10.2008 19:23:00; +07.10.2008 19:24:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;07.10.2008 19:24:00; +22.10.2008 10:03:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;04.10.2008 11:07:00; +04.10.2008 11:20:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;04.10.2008 11:20:00; +07.10.2008 19:23:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;22.10.2008 10:03:00; +16.11.2008 10:41:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;16.11.2008 10:41:00; +19.11.2008 09:30:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;21.11.2008 12:47:00; +21.11.2008 12:48:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;21.11.2008 12:48:00; +21.11.2008 12:57:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;15.12.2008 12:07:00; +15.12.2008 16:02:00 926;unit1;21.11.2008 20:37:00;12.12.2008 22:00:00;15.12.2008 11:16:00; +15.12.2008 12:07:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;19.11.2008 09:30:00; +21.11.2008 12:47:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;07.10.2008 19:23:00; +07.10.2008 19:24:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;07.10.2008 19:24:00; +22.10.2008 10:03:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;04.10.2008 11:07:00; +04.10.2008 11:20:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;04.10.2008 11:20:00; +07.10.2008 19:23:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;22.10.2008 10:03:00; +16.11.2008 10:41:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;16.11.2008 10:41:00; +19.11.2008 09:30:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;21.11.2008 12:47:00; +21.11.2008 12:48:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;21.11.2008 12:48:00; +21.11.2008 12:57:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;15.12.2008 12:07:00; +15.12.2008 16:02:00 926;unit1;16.12.2008 10:00:00;06.01.2009 14:30:00;15.12.2008 11:16:00; +15.12.2008 12:07:00 926;unit2;21.11.2008 20:37:00;12.12.2008 22:00:00;16.12.2008 15:18:00; +06.01.2009 13:51:00 926;unit2;21.11.2008 20:37:00;12.12.2008 22:00:00;21.11.2008 12:57:00; +15.12.2008 11:16:00 926;unit2;21.11.2008 20:37:00;12.12.2008 22:00:00;15.12.2008 16:02:00; +16.12.2008 15:18:00 926;unit2;16.12.2008 10:00:00;06.01.2009 14:30:00;16.12.2008 15:18:00; +06.01.2009 13:51:00 926;unit2;16.12.2008 10:00:00;06.01.2009 14:30:00;21.11.2008 12:57:00; +15.12.2008 11:16:00 926;unit2;16.12.2008 10:00:00;06.01.2009 14:30:00;15.12.2008 16:02:00; +16.12.2008 15:18:00 156;unit3;08.12.2008 17:00:00;19.12.2008 11:00:00;29.10.2008 11:29:00; +29.10.2008 12:57:00 156;unit3;29.10.2008 13:00:00;06.11.2008 10:00:00;29.10.2008 11:29:00; +29.10.2008 12:57:00 156;unit4;08.12.2008 17:00:00;19.12.2008 11:00:00;10.11.2008 10:50:00; +22.11.2008 14:13:00 156;unit4;08.12.2008 17:00:00;19.12.2008 11:00:00;26.11.2008 13:54:00; +06.12.2008 16:01:00 156;unit4;08.12.2008 17:00:00;19.12.2008 11:00:00;21.01.2009 11:40:00; +29.01.2009 15:30:00 156;unit4;29.10.2008 13:00:00;06.11.2008 10:00:00;10.11.2008 10:50:00; +22.11.2008 14:13:00 156;unit4;29.10.2008 13:00:00;06.11.2008 10:00:00;26.11.2008 13:54:00; +06.12.2008 16:01:00 156;unit4;29.10.2008 13:00:00;06.11.2008 10:00:00;21.01.2009 11:40:00; +29.01.2009 15:30:00 156;unit5;08.12.2008 17:00:00;19.12.2008 11:00:00;22.11.2008 14:13:00; +22.11.2008 14:17:00 156;unit5;29.10.2008 13:00:00;06.11.2008 10:00:00;22.11.2008 14:13:00; +22.11.2008 14:17:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;29.10.2008 12:57:00; +29.10.2008 16:25:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;29.10.2008 16:25:00; +09.11.2008 16:05:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;09.11.2008 16:05:00; +10.11.2008 10:50:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;06.12.2008 16:01:00; +11.12.2008 00:21:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;22.11.2008 14:17:00; +22.11.2008 17:14:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;22.11.2008 17:14:00; +26.11.2008 13:54:00 156;unit6;08.12.2008 17:00:00;19.12.2008 11:00:00;11.12.2008 00:21:00; +21.01.2009 11:40:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;29.10.2008 12:57:00; +29.10.2008 16:25:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;29.10.2008 16:25:00; +09.11.2008 16:05:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;09.11.2008 16:05:00; +10.11.2008 10:50:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;06.12.2008 16:01:00; +11.12.2008 00:21:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;22.11.2008 14:17:00; +22.11.2008 17:14:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;22.11.2008 17:14:00; +26.11.2008 13:54:00 156;unit6;29.10.2008 13:00:00;06.11.2008 10:00:00;11.12.2008 00:21:00; +21.01.2009 11:40:00
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Calculation of the time overlapping
by Anonymous Monk on Mar 25, 2012 at 14:05 UTC | |
by vagabonding electron (Curate) on Mar 27, 2012 at 18:33 UTC | |
by Anonymous Monk on Mar 27, 2012 at 18:41 UTC | |
by Anonymous Monk on Mar 27, 2012 at 18:44 UTC | |
by vagabonding electron (Curate) on Mar 27, 2012 at 19:20 UTC |
Back to
Seekers of Perl Wisdom