http://www.perlmonks.org?node_id=994458

linseyr has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a file like this:
chr nr begin end c1 total_c1 chr10 1 10 20 15 2
where the first column describes the chromosome nr, second column nr of peak, third column begin position, fourth column end position, and in total there are 20 columns for (10x class, total_class) so in total 10 classes. I want to find out which class had the highest total, and write a file with "$chr $begin $end $class". There are some classes for which the highest total equals the total of an other class. This line will be "undefined". I wrote a script to find the highest column (probably not the easiest way to do it), but how can I find the undefined lines? So for example this line would be undefined:
chr nr begin end c1 total_c1 c2 total_c2 + c3 total_c3 chr1 2 30 50 10 9 8 9 7 + 2
Because it had two times a 9 as total, which is the highest total in the line. My code looks like this now:

open(classFile,'<',"Results/Classification/classesNormal.txt") or die +$!; my @classes = <classFile>; close(classFile); for my $line(@classes){ my($chr,$nr,$begin,$end,$c1,$total_c1,$c2,$total_c2,$c3,$total_c3, +$c4,$total_c4,$c5,$total_c5,$c6,$total_c6,$c7,$total_c7,$c8,$total_c8 +,$c9,$total_c9,$c10,$total_c10) = split("\t",$line); if ($total_c1 > $total_c2 && $total_c1 > $total_c3 && $total_c1 > +$total_c4 && $total_c1 > $total_c5 && $total_c1 > $total_c6 && $total +_c1 > $total_c7 && $total_c1 > $total_c8 && $total_c1 > $total_c9 && +$total_c1 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c1,"\n"); } if ($total_c2 > $total_c1 && $total_c2 > $total_c3 && $total_c2 > +$total_c4 && $total_c2 > $total_c5 && $total_c2 > $total_c6 && $total +_c2 > $total_c7 && $total_c2 > $total_c8 && $total_c2 > $total_c9 && +$total_c2 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c2,"\n"); } if ($total_c3 > $total_c1 && $total_c3 > $total_c2 && $total_c3 > +$total_c4 && $total_c3 > $total_c5 && $total_c3 > $total_c6 && $total +_c3 > $total_c7 && $total_c3 > $total_c8 && $total_c3 > $total_c9 && +$total_c3 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c3,"\n"); } if ($total_c4 > $total_c1 && $total_c4 > $total_c2 && $total_c4 > +$total_c3 && $total_c4 > $total_c5 && $total_c4 > $total_c6 && $total +_c4 > $total_c7 && $total_c4 > $total_c8 && $total_c4 > $total_c9 && +$total_c4 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c4,"\n"); } if ($total_c5 > $total_c1 && $total_c5 > $total_c2 && $total_c5 > +$total_c3 && $total_c5 > $total_c4 && $total_c5 > $total_c6 && $total +_c5 > $total_c7 && $total_c5 > $total_c8 && $total_c5 > $total_c9 && +$total_c5 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c5,"\n"); } if ($total_c6 > $total_c1 && $total_c6 > $total_c2 && $total_c6 > +$total_c3 && $total_c6 > $total_c4 && $total_c6 > $total_c5 && $total +_c6 > $total_c7 && $total_c6 > $total_c8 && $total_c6 > $total_c9 && +$total_c6 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c3,"\n"); } if ($total_c7 > $total_c1 && $total_c7 > $total_c2 && $total_c7 > +$total_c3 && $total_c7 > $total_c4 && $total_c7 > $total_c5 && $total +_c7 > $total_c6 && $total_c7 > $total_c8 && $total_c7 > $total_c9 && +$total_c7 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c7,"\n"); } if ($total_c8 > $total_c1 && $total_c8 > $total_c2 && $total_c8 > +$total_c3 && $total_c8 > $total_c4 && $total_c8 > $total_c5 && $total +_c8 > $total_c6 && $total_c8 > $total_c7 && $total_c8 > $total_c9 && +$total_c8 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c8,"\n"); } if ($total_c9 > $total_c1 && $total_c9 > $total_c2 && $total_c9 > +$total_c3 && $total_c9 > $total_c4 && $total_c9 > $total_c5 && $total +_c9 > $total_c6 && $total_c9 > $total_c7 && $total_c9 > $total_c8 && +$total_c9 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c9,"\n"); } }

Could somebody help me? :)

Replies are listed 'Best First'.
Re: Find highest value
by choroba (Cardinal) on Sep 19, 2012 at 14:07 UTC
    Naming variables c1 .. c10 suggests you need an array.

    To find a maximum, you can use List::Util.

    The following code returns "$chr $begin $end $class" if the class has the maximal total, or 'undefined' instead of the class number if there are more classes having the same maximum. Tweak it to serve your needs:

    #!/usr/bin/perl use warnings; use strict; use List::Util qw/max/; <>; # skip header for my $line (<>){ my ($chr, $nr, $begin, $end, @values) = split ' ', $line; my %total; while (@values) { my $class = shift @values; my $total = shift @values; $total{$class} = $total; } my $max = max values %total; my @maxes = grep $total{$_} == $max, keys %total; if (@maxes == 1) { print "$chr $begin $end $maxes[0]\n"; } else { print "$chr $begin $end undefined\n"; } }
    Updated code.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thank you for your answer. But the last columns are classes and totals of classes, so I only want to find the maximum of the totals of classes. I should select only subset of the @values and put in array then, but this doesn't work:
      push(@totals, values[1], values[3], values[5], values[7]);
      Is there a way to do this? I want to find the class for which the total of class is the highest, and print the class name..
Re: Find highest value
by Tux (Canon) on Sep 19, 2012 at 14:33 UTC

    Looking at your example code, the .txt file is TAB seperated. You could read it with DBD::CSV if you are slightly acquinted with SQL commands, that might be handy. Here's a start:

    use DBI; my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "Results/Classification", f_ext => ".txt/r", csv_sep_char => "\t", RaiseError => 1, PrintError => 1, }) or die $DBI::errstr; my $sth = $dbh->prepare ("select max (begin) from classesNormal where +nr = 2"); $sth->execute; my ($max) = $sth->fetchrow_array;

    Enjoy, Have FUN! H.Merijn
Re: Find highest value
by nemesdani (Friar) on Sep 19, 2012 at 14:10 UTC
    Consider:
  • read a line into an array
  • split it by whitespace
  • make a slice from the totals (the 5th, 7th, etc. element)
  • select the maximum with e.g. List::Util, use a flag is more maxes are found
  • write out the elements you need


  • Optionally the last 3 steps can be implemented in a subroutine, thus making your code more readable.


    I'm too lazy to be proud of being impatient.