http://www.perlmonks.org?node_id=1077543

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello!
I have this script that calculates the sum of each column in a tab-separated file. My problem is that I would like to make it work even if I don't know how many columns I have, i.e. somehow create the arrays dynamically based on number of columns perhaps...Can you help me?
my @split_array9=(); my @split_array9_col1=(); my @split_array9_col2=(); my @split_array9_col3=(); open (INFILE9, '<', 'ex1.dat'); while( my $line9 = <INFILE9>) { my @split_line9 = split(/\t/, $line9); my $element_col_1 = $split_line9[0]; my $element_col_2 = $split_line9[1]; my $element_col_3 = $split_line9[2]; push (@split_array9_col1, $element_col_1); push (@split_array9_col2, $element_col_2); push (@split_array9_col3, $element_col_3); } close INFILE9; #calculate the sums for each column based on each of the arrays my $sum_col1=0; my $sum_col2=0; my $sum_col3=0; for (my $w=0; $w<=$#split_array9_col1; $w++) { $sum_col1 = $sum_col1 + $split_array9_col1[$w]; } for (my $y=0; $y<=$#split_array9_col2; $y++) { $sum_col2 = $sum_col2 + $split_array9_col2[$y]; } for (my $z=0; $z<=$#split_array9_col3; $z++) { $sum_col3 = $sum_col3 + $split_array9_col3[$z]; } print "The sum for column 1 is: $sum_col1, the sum for column 2 is: $s +um_col2 and the sum for column 3 is: $sum_col3.\n";

Replies are listed 'Best First'.
Re: How can you make this script general?
by davido (Cardinal) on Mar 08, 2014 at 19:03 UTC

    You can't make it more general until you understand how to use multi-dimensional data structures in Perl, which means learning how to use references. Start with perlreftut, and then if you need more depth, continue to perlref, perllol, and perldsc. These documents will be enlightening.

    Eventually you will work toward an implementation where rather than giving each column a named array, you will have an array of rows, and each row element will hold a reference to an anonymous array of columns. ...or you might invert it so that the top level array represents columns, and each column element holds a reference to an anonymous array of row elements.


    Dave

Re: How can you make this script general?
by BrowserUk (Patriarch) on Mar 08, 2014 at 19:17 UTC

    As a one-liner

    C:\test>perl -F\t -anle"$sums[$_]+=$F[$_] for 0 .. $#F; }{ printf qq[C +olumn:%u total:%u\n], $_, $sums[$_] for 0 .. $#sums" 1 2 3 4 5 6 7 8 9 ^Z Column:0 total:12 Column:1 total:15 Column:2 total:18

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I made it like this:
      #The idea is to read the first line of the file and see how many colum +ns we have, then we proceed accordingly, column-by-column open (INFILE10, '<', 'ex1.dat') or die "File ex1.dat does not exist!\n +"; my $firstLine = <INFILE10>; close INFILE10; my @array_firstLine=split(/\t/, $firstLine); my $total_columns=scalar(@array_firstLine); print "This file has $total_columns columns in total.\n"; for(my $k=1; $k<=$total_columns; $k++) { print "Calculate sum for column $k\n"; my $wanted_column_number=$k; #this is the column that we want to +sum up each time, until we finish the columns my $sum_of_column=0; open (INFILE10, '<', 'ex1.dat') or die "File ex1.dat does not exist! +\n"; while( my $line10 = <INFILE10>) { my @split_line10 = split(/\t/, $line10); my $respective_element = $split_line10[$k-1]; $sum_of_column = $sum_of_column + $respective_element; } close INFILE10; print "The sum for column $k is: $sum_of_column.\n"; }
        mate you're wrong,
        with perl you don't have to do any of what you're doing.
        while(@a = split /\t/, <DATA>){ $b[$_] += $a[$_] for 0..$#a; } print "@b\n"; __DATA__ 1 2 3 4 5 6 7 8 9 10 11 12
Re: How can you make this script general?
by tangent (Parson) on Mar 08, 2014 at 19:12 UTC
    If all you need to do is calculate the sums then this might help:
    my %count; while (my $line = <DATA>) { chomp $line; my @cols = split("\t",$line); $count{$_} += $cols[$_] for 0 .. $#cols; } for my $col_num (sort { $a <=> $b } keys %count) { print "Total for column $col_num: $count{$col_num}\n"; } __DATA__ 1 2 3 4 3 4 5 6 6 7 8 9
    Output:
    Total for column 0: 10 Total for column 1: 13 Total for column 2: 16 Total for column 3: 19

      Why use a hash and then have to sort, when the column numbers are ... well, numbers?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Why use a hash and then have to sort
        Just a habit really. An array would be more efficient.
        perhaps tangent wants to be prepared for the general case, where the columns have names...
Re: How can you make this script general?
by AnomalousMonk (Archbishop) on Mar 08, 2014 at 21:11 UTC

    If you're dealing with really ragged input data:

    use 5.010; # for // operator use warnings; use strict; use Data::Dump; use Test::More # tests => ?? + 1 # Test::NoWarnings adds 1 test 'no_plan' ; use Test::NoWarnings; my @totals; while (my $line = <DATA>) { chomp $line; my @fields = split ' ', $line; for my $i (0 .. $#fields) { # $totals[$i] = $fields[$i] + (defined($totals[$i]) ? $totals[$i +] : 0); $totals[$i] = $fields[$i] + ($totals[$i] // 0); } # dd \@fields; dd\@totals; # FOR DEBUG } my $max_input_cols = @totals; ok $max_input_cols == 5, qq{max number input columns}; is_deeply \@totals, [ 21, 176, 909, 6006, 20002 ], qq{column totals}; printf qq{max cols in input data: %d \n}, $max_input_cols; print qq{column totals: \n}; printf qq{%6d}, $_ for 0 .. $#totals; print qq{\n}; for my $col (@totals) { printf qq{%6d}, $col; } print qq{\n}; __DATA__ 1 11 2 22 202 2002 20002 3 33 303 4 44 404 4004 5 6 66

    Output:

    c:\@Work\Perl\monks\Anonymous Monk\1077543>perl ragged_field_summation +_1.pl ok 1 - max number input columns ok 2 - column totals max cols in input data: 5 column totals: 0 1 2 3 4 21 176 909 6006 20002 ok 3 - no warnings 1..3
Re: How can you make this script general?
by Laurent_R (Canon) on Mar 08, 2014 at 23:54 UTC

    No real need for Perl references in my view, nor for any complex data structure. A simple array should do the work (if I understood the requirement well).

    use strict; use warnings; my @sums; $sums[$_] = 0 for 0..20; while (<DATA>) { my @fields = split /\s+/, $_; for (0..20) { $sums[$_] += $fields[$_] if defined $fields[$_]; } } print "@sums", "\n"; __DATA__ 1 2 3 4 6 3 4 5 6 6 7 8 9
    My only assumption is that the number of columns is equal to or less than 21. This is the resuling outputt;
    $ perl column_sum.pl 10 13 16 19 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Re: How can you make this script general?
by llancet (Friar) on Mar 10, 2014 at 09:34 UTC

    You should notice that Perl arrays are autovivified, where if you use an index out of range, the array will be auto-expanded.

    So what you need is: use an array to record the sum of columns. each time you read a line, and add the columns to the sum array.

    my @sum; while (<FH>) { chomp; my @F = split /\t/; $sum[$_] += $F[$_] for 0..@F-1; }