Perl: the Markov chain saw PerlMonks

### Find highest value

by linseyr (Acolyte)
 on Sep 19, 2012 at 13:43 UTC Need Help??
linseyr has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a file like this:
```chr        nr        begin        end         c1      total_c1
chr10    1         10             20          15          2
where the first column describes the chromosome nr, second column nr of peak, third column begin position, fourth column end position, and in total there are 20 columns for (10x class, total_class) so in total 10 classes. I want to find out which class had the highest total, and write a file with "\$chr \$begin \$end \$class". There are some classes for which the highest total equals the total of an other class. This line will be "undefined". I wrote a script to find the highest column (probably not the easiest way to do it), but how can I find the undefined lines? So for example this line would be undefined:
```chr      nr      begin   end      c1      total_c1   c2    total_c2
+ c3    total_c3
chr1    2     30      50      10      9         8        9           7
+        2
Because it had two times a 9 as total, which is the highest total in the line. My code looks like this now:

```
open(classFile,'<',"Results/Classification/classesNormal.txt") or die
+\$!;
my @classes = <classFile>;
close(classFile);

for my \$line(@classes){
my(\$chr,\$nr,\$begin,\$end,\$c1,\$total_c1,\$c2,\$total_c2,\$c3,\$total_c3,
+\$c4,\$total_c4,\$c5,\$total_c5,\$c6,\$total_c6,\$c7,\$total_c7,\$c8,\$total_c8
+,\$c9,\$total_c9,\$c10,\$total_c10) = split("\t",\$line);
if (\$total_c1 > \$total_c2 && \$total_c1 > \$total_c3 && \$total_c1 >
+\$total_c4 && \$total_c1 > \$total_c5 && \$total_c1 > \$total_c6 && \$total
+_c1 > \$total_c7 && \$total_c1 > \$total_c8 && \$total_c1 > \$total_c9 &&
+\$total_c1 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c1,"\n");
}
if (\$total_c2 > \$total_c1 && \$total_c2 > \$total_c3 && \$total_c2 >
+\$total_c4 && \$total_c2 > \$total_c5 && \$total_c2 > \$total_c6 && \$total
+_c2 > \$total_c7 && \$total_c2 > \$total_c8 && \$total_c2 > \$total_c9 &&
+\$total_c2 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c2,"\n");
}
if (\$total_c3 > \$total_c1 && \$total_c3 > \$total_c2 && \$total_c3 >
+\$total_c4 && \$total_c3 > \$total_c5 && \$total_c3 > \$total_c6 && \$total
+_c3 > \$total_c7 && \$total_c3 > \$total_c8 && \$total_c3 > \$total_c9 &&
+\$total_c3 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c3,"\n");
}
if (\$total_c4 > \$total_c1 && \$total_c4 > \$total_c2 && \$total_c4 >
+\$total_c3 && \$total_c4 > \$total_c5 && \$total_c4 > \$total_c6 && \$total
+_c4 > \$total_c7 && \$total_c4 > \$total_c8 && \$total_c4 > \$total_c9 &&
+\$total_c4 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c4,"\n");
}
if (\$total_c5 > \$total_c1 && \$total_c5 > \$total_c2 && \$total_c5 >
+\$total_c3 && \$total_c5 > \$total_c4 && \$total_c5 > \$total_c6 && \$total
+_c5 > \$total_c7 && \$total_c5 > \$total_c8 && \$total_c5 > \$total_c9 &&
+\$total_c5 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c5,"\n");
}
if (\$total_c6 > \$total_c1 && \$total_c6 > \$total_c2 && \$total_c6 >
+\$total_c3 && \$total_c6 > \$total_c4 && \$total_c6 > \$total_c5 && \$total
+_c6 > \$total_c7 && \$total_c6 > \$total_c8 && \$total_c6 > \$total_c9 &&
+\$total_c6 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c3,"\n");
}
if (\$total_c7 > \$total_c1 && \$total_c7 > \$total_c2 && \$total_c7 >
+\$total_c3 && \$total_c7 > \$total_c4 && \$total_c7 > \$total_c5 && \$total
+_c7 > \$total_c6 && \$total_c7 > \$total_c8 && \$total_c7 > \$total_c9 &&
+\$total_c7 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c7,"\n");
}
if (\$total_c8 > \$total_c1 && \$total_c8 > \$total_c2 && \$total_c8 >
+\$total_c3 && \$total_c8 > \$total_c4 && \$total_c8 > \$total_c5 && \$total
+_c8 > \$total_c6 && \$total_c8 > \$total_c7 && \$total_c8 > \$total_c9 &&
+\$total_c8 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c8,"\n");
}
if (\$total_c9 > \$total_c1 && \$total_c9 > \$total_c2 && \$total_c9 >
+\$total_c3 && \$total_c9 > \$total_c4 && \$total_c9 > \$total_c5 && \$total
+_c9 > \$total_c6 && \$total_c9 > \$total_c7 && \$total_c9 > \$total_c8 &&
+\$total_c9 > \$total_c10){
print (\$chr,"\t",\$begin,"\t",\$end,"\t",\$c9,"\n");

}
}

Could somebody help me? :)

Replies are listed 'Best First'.
Re: Find highest value
by choroba (Bishop) on Sep 19, 2012 at 14:07 UTC
Naming variables c1 .. c10 suggests you need an array.

To find a maximum, you can use List::Util.

The following code returns "\$chr \$begin \$end \$class" if the class has the maximal total, or 'undefined' instead of the class number if there are more classes having the same maximum. Tweak it to serve your needs:

```#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw/max/;

for my \$line (<>){
my (\$chr, \$nr, \$begin, \$end, @values) = split ' ', \$line;

my %total;
while (@values) {
my \$class = shift @values;
my \$total = shift @values;
\$total{\$class} = \$total;
}
my \$max = max values %total;
my @maxes = grep \$total{\$_} == \$max, keys %total;
if (@maxes == 1) {
print "\$chr \$begin \$end \$maxes[0]\n";
} else {
print "\$chr \$begin \$end undefined\n";
}
}
Updated code.
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Thank you for your answer. But the last columns are classes and totals of classes, so I only want to find the maximum of the totals of classes. I should select only subset of the @values and put in array then, but this doesn't work:
```push(@totals, values[1], values[3], values[5], values[7]);
Is there a way to do this? I want to find the class for which the total of class is the highest, and print the class name..
Re: Find highest value
by Tux (Abbot) on Sep 19, 2012 at 14:33 UTC

Looking at your example code, the .txt file is TAB seperated. You could read it with DBD::CSV if you are slightly acquinted with SQL commands, that might be handy. Here's a start:

```use DBI;

my \$dbh = DBI->connect ("dbi:CSV:", undef, undef, {
f_dir        => "Results/Classification",
f_ext        => ".txt/r",

csv_sep_char => "\t",

RaiseError   => 1,
PrintError   => 1,
}) or die \$DBI::errstr;

my \$sth = \$dbh->prepare ("select max (begin) from classesNormal where
+nr = 2");
\$sth->execute;
my (\$max) = \$sth->fetchrow_array;

Enjoy, Have FUN! H.Merijn
Re: Find highest value
by nemesdani (Friar) on Sep 19, 2012 at 14:10 UTC
Consider:
• read a line into an array
• split it by whitespace
• make a slice from the totals (the 5th, 7th, etc. element)
• select the maximum with e.g. List::Util, use a flag is more maxes are found
• write out the elements you need

• Optionally the last 3 steps can be implemented in a subroutine, thus making your code more readable.

I'm too lazy to be proud of being impatient.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://994458]
Front-paged by Arunbear
help
Chatterbox?
 [bounim]: yo

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2018-01-21 22:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
How did you see in the new year?

Results (230 votes). Check out past polls.

Notices?