Re^7: Computing results through Arrays

I can't tell for sure since you didn't show your __DATA__ section, but since it's complaining about not having a string to split after chomp, it's probably running into a blank line in your __DATA__ section. You can test for that before splitting (see below). On your other question, yes, it's worth sorting the database names into an array, not because of how many databases there are, but because you don't want to sort them again for each of 100_000 records, as my pseudo-code did.

I went ahead and did a working version that pulls in the data and produces the per-hour report like I think you want it. I added a lot of comments, but feel free to ask about anything you don't understand. I think you should be able to add the per-minute section yourself (see the comments for where), based on how the per-hour section works.

Once you're comfortable with how it works, one way to make the printing nicer would be to calculate the width of each column (for printf) based on the maximum width of the items in that column. I didn't get into that here, to keep it simple.

#!/usr/bin/env perl
use 5.010; use strict; use warnings;

my %h; my %m; my %db; # per-hour hash, per-minute hash, database names

while(<DATA>){
    next unless /\w/;                                # skip blank line
+s
    my($datetime,$database,$speed) = (split)[1,2,3];
    my $ddhhmm = substr $datetime,0,19;              # substr works we
+ll here since the lengths are static
    my $ddhh   = substr $datetime,0,16;              # this one doesn'
+t include the minutes
    $h{$ddhh  }{$database} += $speed;                # add the speed t
+o this hour & database
    $m{$ddhhmm}{$database} += $speed;                # add the speed t
+o this minute & database
    $db{$database} = 1;                              # save the databa
+se name
}
my @db = sort keys %db; # sort and save database names as array since 
+we'll be looping through them many times

# HOUR SECTION START
# print out the per-hour stats
# starting with a header line
print "  collectionTime";
printf "%11s", $_ for (@db);   # print each database name as a header 
+taking 10 spaces
print "\n";                    # end of line

for my $key (sort keys %h){
    print $key;                             # print the date/hour key
    printf "%11s", $h{$key}{$_} for (@db);  # print the value for each
+ database that goes with this key
    print "\n";
}
# HOUR SECTION END

# MINUTE SECTION START (using %m instead of %h)
# MINUTE SECTION END

__DATA__
server01: 2015-06-01T12:40:03-04:00  DB101                  10 MB/sec
server01: 2015-06-01T12:40:03-04:00  DB202                   5 MB/sec
server01: 2015-06-01T12:40:03-04:00  ASM                     2 MB/sec
server01: 2015-06-01T12:40:03-04:00  MYDB101                 2 MB/sec
server01: 2015-06-01T12:40:03-04:00  MYDB202                 5 MB/sec
server01: 2015-06-01T12:40:03-04:00  _OTHER_DB_             30 MB/sec
server01: 2015-06-01T12:41:03-04:00  DB101                   3 MB/sec
server01: 2015-06-01T12:41:03-04:00  DB202                   4 MB/sec
server01: 2015-06-01T12:41:03-04:00  ASM                     2 MB/sec
server01: 2015-06-01T12:41:03-04:00  MYDB101                 9 MB/sec
server01: 2015-06-01T12:41:03-04:00  MYDB202                 7 MB/sec
server01: 2015-06-01T12:41:03-04:00  _OTHER_DB_             50 MB/sec
server02: 2015-06-01T12:40:03-04:00  DB101                  90 MB/sec
server02: 2015-06-01T12:40:03-04:00  DB202                   9 MB/sec
server02: 2015-06-01T12:40:03-04:00  ASM                     2 MB/sec
server02: 2015-06-01T12:40:03-04:00  MYDB101                 3 MB/sec
server02: 2015-06-01T12:40:03-04:00  MYDB202                 1 MB/sec
server02: 2015-06-01T12:40:03-04:00  _OTHER_DB_             90 MB/sec
server02: 2015-06-01T12:41:03-04:00  DB101                   1 MB/sec
server02: 2015-06-01T12:41:03-04:00  DB202                   4 MB/sec
server02: 2015-06-01T12:41:03-04:00  ASM                     2 MB/sec
server02: 2015-06-01T12:41:03-04:00  MYDB101                 7 MB/sec
server02: 2015-06-01T12:41:03-04:00  MYDB202                 7 MB/sec
server02: 2015-06-01T12:41:03-04:00  _OTHER_DB_             55 MB/sec
[download]

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

Comment on Re^7: Computing results through Arrays Download Code

Replies are listed 'Best First'.
Re^8: Computing results through Arrays by yasser8@gmail.com (Novice) on Jun 07, 2015 at 06:02 UTC
Once again thanks a lot for all your help... You were right, there were some blanks lines. I did thorough testing by having Day, Hour, Minute grouping, now its working perfectly fine but as you said sometimes printing gets messier when columns width of the items are larger. Is there a way to sort it out when columns width are larger and if they are not then have only one single space between each database name ? I always gets curious and motivated by looking at your style of coding, its very elegant and precise. I am very happy to see that my coding has improved a lot by following your style of coding. Thanks a lot Sir!!!	[reply]
Re^9: Computing results through Arrays by aaron_baugher (Curate) on Jun 07, 2015 at 14:37 UTC
Thank you, I'm glad we were able to get it working for you. I've learned a lot about style from this site too. By the way, I noticed a mistake: I was using too long a length in my substr() calls, so I was treating seconds as minutes and minutes as hours. I've corrected that in the version below. On making the column widths dynamic, think about what you'll need to do. After getting all the values into your hash tables, you'll need to loop through them by column, finding the length of each value and saving the largest length found somewhere, matched to that column. We can save them as the values of our %db hash, since we just had '1' placeholders there before. So the keys of %db will still be the database names (and column headers), but later the values will become the column widths. This is a bit complicated, and we need to do it twice, once for the hour report and once for the minute report, so I'll make it a subroutine. I pass the main hash (%h or %m) and the %db hash to it as references. I also pass a ref to the @db array of database names, so the subroutine doesn't have to re-get the keys from %db. Since the top-level keys of the hash are the datetimes, I have to loop through those first, then inside that I loop through the database names, decide which length is longer -- the one already saved for that column, or the length of the current item -- and save that in the database name hash. When it's finished, my main program can get the lengths for each column from the values in %db. sub set_column_widths { my $h = shift; # reference to hash table of speeds, keyed +by datetime, then by database my $databases = shift; # reference to hash of database names, wher +e we will set the width values my $names = shift; # ref to array of database names # (so we don't need to call keys on $da +tabases repeatedly) for my $key (keys %$h){ for my $db (@$names){ my $l = length $h->{$key}{$db}; # get length of this item + in the hash table # set column width to the + widest length $databases->{$db} = $databases->{$db} > $l ? $databases->{ +$db} : $l; } } for my $db (@$names){ # check the width of the database names the +mselves too my $l = length $db; $databases->{$db} = $databases->{$db} > $l ? $databases->{$db} + : $l; } } [download] Now I just need to call that before printing each report, like this: `set_column_widths(\%h, \%db, \@db);` [download] Now when it's time to print out the columns, we can get the width and use it in the printf() statements where I previously hard-coded 11. For instance, in this line which prints the database names: `printf "%11s", $_ for (@db); # replace 11 with $db{$_}, the saved width for this column, and # stick a space between columns printf " %$db{$_}s", $_ for (@db);` [download] You'll need to make the same change in the other printf() statement, and then you should have dynamic-width columns. Here's the full script with these changes, in case it's not clear where I made them, plus the fixes for my substr() length mistake: Read more... (6 kB) Aaron B. Available for small or large Perl jobs and *nix system administration; see my home node.	[reply] [d/l] [select]
Re^10: Computing results through Arrays by yasser8@gmail.com (Novice) on Jun 24, 2015 at 09:49 UTC
Sorry for visiting back on this Aaron Sir. Actually I need to calculate AVERAGE and MAXIMUM within each group the same way we did for SUM earlier. I was able to draft AVERAGE logic and it works fine, but I am not able to derive MAXIMUM. Could you please help me on this and guide me please if my approach for AVERAGE can be written in still more efficient way. AVERAGE logic :- SUM($speed)/ DISTINCT($servers) * 60 for Hour MAXIMUM logic :- MAX($speed) across all the $servers within each Hour #!/usr/bin/env perl use strict; use warnings; my %h; my %m; my %db; my %sr; sub round { $_[0] > 0 ? int($_[0] + .5) : -int(-$_[0] + .5) } sub fnd_max (\%) { my $hash = shift; my ($key, @keys) = keys %$hash; my ($big, @vals) = values %$hash; for (0 .. $#keys) { if ($vals[$_] > $big) { $big = $vals[$_]; $key = $keys[$_]; } } $big } sub set_column_widths { my $h = shift; my $databases = shift; my $names = shift; for my $key (keys %$h){ for my $db (@$names){ my $l = length $h->{$key}{$db}; $databases->{$db} = $databases->{$db} > $l ? $databases->{ +$db} : $l; } } for my $db (@$names){ # check the width of the database names the +mselves too my $l = length $db; $databases->{$db} = $databases->{$db} > $l ? $databases->{$db} + : $l; } } while(<DATA>){ next unless /\w/; my($server,$datetime,$database,$speed) = (split)[0,1,2,3]; my $ddhhmm = substr $datetime,0,16; my $ddhh = substr $datetime,0,13; $h{$ddhh }{$database} += $speed; $m{$ddhhmm}{$database} += $speed; $db{$database} = 1; $sr{$server } = 1; } my @db = sort keys %db; # sort and save database names as array since +we'll be looping through them many times my $count = keys %sr; # HOUR SECTION START - AVG for my $key (sort keys %h){ for (@db) { $h{$key}{$_} = round($h{$key}{$_} / ($count * 60))} ; } set_column_widths(\%h, \%db, \@db); print "Frequency Hour:\ncollectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %h){ print "$key "; printf " %$db{$_}s", $h{$key}{$_} for (@db); print "\n"; } # HOUR SECTION END - AVG # MINUTE SECTION START - AVG for my $key (sort keys %m){ for (@db) { $m{$key}{$_} = round($m{$key}{$_} / ($count))} ; } set_column_widths(\%m, \%db, \@db); print "\nFrequency Minute:\n collectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %m){ print $key; printf " %$db{$_}s", $m{$key}{$_} for (@db); print "\n"; } # MINUTE SECTION END - AVG # HOUR SECTION START - MAX set_column_widths(\%h, \%db, \@db); print "Frequency Hour:\ncollectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %h){ print "$key "; printf " %$db{$_}s", max (values $h{$key}{$_}) for (@db); print "\n"; } # HOUR SECTION END - MAX # MINUTE SECTION START - MAX print fnd_max %m ; set_column_widths(\%m, \%db, \@db); print "\nFrequency Minute:\n collectionTime"; printf " %$db{$_}s", $_ for (@db); print "\n"; for my $key (sort keys %m){ print $key; printf " %$db{$_}s", max (values $m{$key}{$_}) for (@db); print "\n"; } # MINUTE SECTION END - MAX [download]	[reply] [d/l]
Re^11: Computing results through Arrays by aaron_baugher (Curate) on Jun 24, 2015 at 10:46 UTC
Re^12: Computing results through Arrays by yasser8@gmail.com (Novice) on Jun 24, 2015 at 13:01 UTC
Some notes below your chosen depth have not been shown here
Re^11: Computing results through Arrays by robby_dobby (Hermit) on Jun 24, 2015 at 10:02 UTC


laziness, impatience, and hubris
	PerlMonks