Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Find highest value

by linseyr (Acolyte)
on Sep 19, 2012 at 13:43 UTC ( #994458=perlquestion: print w/replies, xml ) Need Help??
linseyr has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a file like this:
chr nr begin end c1 total_c1 chr10 1 10 20 15 2
where the first column describes the chromosome nr, second column nr of peak, third column begin position, fourth column end position, and in total there are 20 columns for (10x class, total_class) so in total 10 classes. I want to find out which class had the highest total, and write a file with "$chr $begin $end $class". There are some classes for which the highest total equals the total of an other class. This line will be "undefined". I wrote a script to find the highest column (probably not the easiest way to do it), but how can I find the undefined lines? So for example this line would be undefined:
chr nr begin end c1 total_c1 c2 total_c2 + c3 total_c3 chr1 2 30 50 10 9 8 9 7 + 2
Because it had two times a 9 as total, which is the highest total in the line. My code looks like this now:

open(classFile,'<',"Results/Classification/classesNormal.txt") or die +$!; my @classes = <classFile>; close(classFile); for my $line(@classes){ my($chr,$nr,$begin,$end,$c1,$total_c1,$c2,$total_c2,$c3,$total_c3, +$c4,$total_c4,$c5,$total_c5,$c6,$total_c6,$c7,$total_c7,$c8,$total_c8 +,$c9,$total_c9,$c10,$total_c10) = split("\t",$line); if ($total_c1 > $total_c2 && $total_c1 > $total_c3 && $total_c1 > +$total_c4 && $total_c1 > $total_c5 && $total_c1 > $total_c6 && $total +_c1 > $total_c7 && $total_c1 > $total_c8 && $total_c1 > $total_c9 && +$total_c1 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c1,"\n"); } if ($total_c2 > $total_c1 && $total_c2 > $total_c3 && $total_c2 > +$total_c4 && $total_c2 > $total_c5 && $total_c2 > $total_c6 && $total +_c2 > $total_c7 && $total_c2 > $total_c8 && $total_c2 > $total_c9 && +$total_c2 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c2,"\n"); } if ($total_c3 > $total_c1 && $total_c3 > $total_c2 && $total_c3 > +$total_c4 && $total_c3 > $total_c5 && $total_c3 > $total_c6 && $total +_c3 > $total_c7 && $total_c3 > $total_c8 && $total_c3 > $total_c9 && +$total_c3 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c3,"\n"); } if ($total_c4 > $total_c1 && $total_c4 > $total_c2 && $total_c4 > +$total_c3 && $total_c4 > $total_c5 && $total_c4 > $total_c6 && $total +_c4 > $total_c7 && $total_c4 > $total_c8 && $total_c4 > $total_c9 && +$total_c4 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c4,"\n"); } if ($total_c5 > $total_c1 && $total_c5 > $total_c2 && $total_c5 > +$total_c3 && $total_c5 > $total_c4 && $total_c5 > $total_c6 && $total +_c5 > $total_c7 && $total_c5 > $total_c8 && $total_c5 > $total_c9 && +$total_c5 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c5,"\n"); } if ($total_c6 > $total_c1 && $total_c6 > $total_c2 && $total_c6 > +$total_c3 && $total_c6 > $total_c4 && $total_c6 > $total_c5 && $total +_c6 > $total_c7 && $total_c6 > $total_c8 && $total_c6 > $total_c9 && +$total_c6 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c3,"\n"); } if ($total_c7 > $total_c1 && $total_c7 > $total_c2 && $total_c7 > +$total_c3 && $total_c7 > $total_c4 && $total_c7 > $total_c5 && $total +_c7 > $total_c6 && $total_c7 > $total_c8 && $total_c7 > $total_c9 && +$total_c7 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c7,"\n"); } if ($total_c8 > $total_c1 && $total_c8 > $total_c2 && $total_c8 > +$total_c3 && $total_c8 > $total_c4 && $total_c8 > $total_c5 && $total +_c8 > $total_c6 && $total_c8 > $total_c7 && $total_c8 > $total_c9 && +$total_c8 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c8,"\n"); } if ($total_c9 > $total_c1 && $total_c9 > $total_c2 && $total_c9 > +$total_c3 && $total_c9 > $total_c4 && $total_c9 > $total_c5 && $total +_c9 > $total_c6 && $total_c9 > $total_c7 && $total_c9 > $total_c8 && +$total_c9 > $total_c10){ print ($chr,"\t",$begin,"\t",$end,"\t",$c9,"\n"); } }

Could somebody help me? :)

Replies are listed 'Best First'.
Re: Find highest value
by choroba (Chancellor) on Sep 19, 2012 at 14:07 UTC
    Naming variables c1 .. c10 suggests you need an array.

    To find a maximum, you can use List::Util.

    The following code returns "$chr $begin $end $class" if the class has the maximal total, or 'undefined' instead of the class number if there are more classes having the same maximum. Tweak it to serve your needs:

    #!/usr/bin/perl use warnings; use strict; use List::Util qw/max/; <>; # skip header for my $line (<>){ my ($chr, $nr, $begin, $end, @values) = split ' ', $line; my %total; while (@values) { my $class = shift @values; my $total = shift @values; $total{$class} = $total; } my $max = max values %total; my @maxes = grep $total{$_} == $max, keys %total; if (@maxes == 1) { print "$chr $begin $end $maxes[0]\n"; } else { print "$chr $begin $end undefined\n"; } }
    Updated code.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thank you for your answer. But the last columns are classes and totals of classes, so I only want to find the maximum of the totals of classes. I should select only subset of the @values and put in array then, but this doesn't work:
      push(@totals, values[1], values[3], values[5], values[7]);
      Is there a way to do this? I want to find the class for which the total of class is the highest, and print the class name..
Re: Find highest value
by Tux (Abbot) on Sep 19, 2012 at 14:33 UTC

    Looking at your example code, the .txt file is TAB seperated. You could read it with DBD::CSV if you are slightly acquinted with SQL commands, that might be handy. Here's a start:

    use DBI; my $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_dir => "Results/Classification", f_ext => ".txt/r", csv_sep_char => "\t", RaiseError => 1, PrintError => 1, }) or die $DBI::errstr; my $sth = $dbh->prepare ("select max (begin) from classesNormal where +nr = 2"); $sth->execute; my ($max) = $sth->fetchrow_array;

    Enjoy, Have FUN! H.Merijn
Re: Find highest value
by nemesdani (Friar) on Sep 19, 2012 at 14:10 UTC
  • read a line into an array
  • split it by whitespace
  • make a slice from the totals (the 5th, 7th, etc. element)
  • select the maximum with e.g. List::Util, use a flag is more maxes are found
  • write out the elements you need

  • Optionally the last 3 steps can be implemented in a subroutine, thus making your code more readable.

    I'm too lazy to be proud of being impatient.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://994458]
Front-paged by Arunbear
[ambrus]: And I don't much like syntax highlighters. If you need a syntax highlighter to understand your code, then your code is written unclear.
[ambrus]: And if you need a syntax highlighter to color parenthesis green and numbers black and letters blue, then you're using the wrong font.
[ambrus]: I have to tolerate syntax highlighters when other people use them, but I don't use them myself. And sorry for the rant.
[GotToBTru]: I appreciate the ability to highlight matching brackets/ parentheses/ braces, both for my own code and the inconsistently indented code of others
[Corion]: I like editors that automatically highlight the matching parenthesis (like % in vi), but that's roughly the extent to which I like editor support ;)
[GotToBTru]: same here Corion.
[Corion]: I tried for a short time (well, 8 hours now) to get Perl::Tidy set up but then found that it doesn't support (new-style) signatures and then stopped again ;))

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (12)
As of 2017-02-27 12:47 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (385 votes). Check out past polls.