http://www.perlmonks.org?node_id=1007488

amit1mtr has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Add same columns of a file and then compare two file contents
by shmem (Chancellor) on Dec 06, 2012 at 14:30 UTC
    First, please go over your message and format it properly as per the List of Perl Monks Approved HTML Tags - just like I did quoting you:
    I want output.txt file in following format
    1. To add and take average of second column entry(time stamp in hh:mm:ss) in file p10.txt (eg Retrieve_generic_assembly_(CP) is coming three times. So add all 3 entry and take average).
      Similarly do it for p20.txt.
    2. Write a new file output.txt that should have entry for all first column and avg entry of file p10.txt in second column and avg entry of p20.txt in third column and take difference of second and third column in fourth column.
    I want output.txt format like this in 4 column
    Subsection P10 P20 Delta(p10-p20) Retrieve_generic_assembly_(CP) 00:01:25 00:01:26 -00:00:01 Retrieve_assembly_1_(CP) 00:01:35 00:00:45 00:00:50 Retrieve_assembly_2_(CP) 00:01:42 00:02:46 -00:01:04

    I am trying to achieve it through following code.

    Then, please reformat your code with proper indenting, and insert comments telling yourself (and then us) at which place you are expecting to achieve what. For instance, I cannot see average calculus anywhere, not even an attempt thereof.

    Hold on, you might get better answers, and nicer ones.

Re: Add same columns of a file and then compare two file contents
by Kenosis (Priest) on Dec 06, 2012 at 20:33 UTC

    Perhaps the following will assist you in crafting a complete solution:

    use strict; use warnings; my %retrieve; my $count = 0; while (<DATA>) { next unless /^Retrieve_/; $count++; my ( $retrieve, $time ) = split; my ( $h, $m, $s ) = split ':', $time; $retrieve{$retrieve} += $h * 3600 + $m * 60 + $s; } for my $retrieve ( keys %retrieve ) { my $hms = secondsToHMS($retrieve{$retrieve} / ( $count / 3 )); print "$retrieve\t$hms\n" if defined $hms; } # For seconds < 86400, else undef returned sub secondsToHMS { my $seconds = $_[0] // 0; return undef if $seconds >= 86400; my $h = int $seconds / 3600; my $m = int( $seconds - $h * 3600 ) / 60; my $s = $seconds % 60; return sprintf( '%02d:%02d:%02d', $h, $m, $s ); } __DATA__ perform testing Date: Nov-29-2012 Host: amg Machtype: x86e_win64 mode: 7 ver: 1400 Build : A-01-10 ------------------------------------------------------------------- testcase Version x-20-17 ------------------------------------------------------------------- Retrieve_generic_assembly_(CP) 00:01:20 Retrieve_assembly_1_(CP) 00:01:32 Retrieve_assembly_2_(CP) 00:01:41 ------------------------------------------------------------------- xxx 00:47:03 yyy 00:47:31 ************************************************** ** ************************************************** ** ------------------------------------------------------------------- ggg Version P-20-17 ------------------------------------------------------------------- Retrieve_generic_assembly_(CP) 00:01:25 Retrieve_assembly_1_(CP) 00:01:35 Retrieve_assembly_2_(CP) 00:01:42 ------------------------------------------------------------------- xxx 00:47:03 yyy 00:47:31 ************************************************** ** ************************************************** ** ------------------------------------------------------------------- ggg Version P-20-17 ------------------------------------------------------------------- Retrieve_generic_assembly_(CP) 00:01:25 Retrieve_assembly_1_(CP) 00:01:35 Retrieve_assembly_2_(CP) 00:01:42 ------------------------------------------------------------------- pp 00:47:02 kk 00:47:36 ************************************************** ** ************************************************** **

    Output:

    Retrieve_assembly_1_(CP) 00:01:34 Retrieve_assembly_2_(CP) 00:01:41 Retrieve_generic_assembly_(CP) 00:01:23

    The output shows the average time for each Retrieve_ in p10.txt.

    The seconds are continually summed. When the data reading is done, the total seconds are averaged for each Retrieve_, and then printed in H:M:S format.

    To process the two files, you could use a construct like this:

    $retrieve{$retrieve}{fileName} += $h * 3600 + $m * 60 + $s;

    The you could use the subroutine above to calculate both avg H:M:S--including their difference with a prepended minus sign, where appropriate. The last chellenge is to figure out how to print the Retrieve_ lines in your desired order.

    Hope this helps!

          Please reformat your data within <code> tags, as it's difficult to distinguish it from your comments.