Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Help the counting!

by jethro (Monsignor)
on Mar 22, 2012 at 12:08 UTC ( #960990=note: print w/ replies, xml ) Need Help??


in reply to Help the counting!

What is bp?

Without exactly knowing what you want I can tell you that your script seems to have 3 loops nested into each other which will make the running time of your script unbearable for non-trivial data. And your problem doesn't seem to need that.

How about this algorithm: Sort the ranges in ascending start position (you need to use a more complex data structure like Array-of-Arrays for this). Store the range of the first cluster into $xstart and $xend. For each new cluster (outer loop) test if the new cluster overlaps and if yes, count the overlap (inner loop) and update $xend with the end of the range


Comment on Re: Help the counting!
Replies are listed 'Best First'.
Re^2: Help the counting!
by g-alone (Initiate) on Mar 22, 2012 at 12:39 UTC
    I guess Easiest way is to show you what my data looks like in simpler way bp = basepair I have data of different chromosomes with start and end point of cluster tags in each of chromosomes it looks like :
    columns = 1: chr 2: start 3:end 4: info(X) 5: info(X) 6:strand chr1 101 105 X X - chr1 102 108 X X - chr1 106 111 X X - chr1 112 113 X X - chr1 113 115 X X - chr2 114 118 X X - chr2 119 121 X X - chr2 120 123 X X - chr3 125 130 X X - chr3 131 132 X X - I need column 1 - 2 -3 - 6 I want to count the overlappes with 2 basepair overlappes for each cor +dinates in each chromosome and give ID number for those over laps tog +ether like in chr1 there are 4 cordinates and 3 of them have overlapp +ed except the last cordinate : so I will need to first count those 3 ove +rlappes and give them ID_1 then ID_2 will give to last cordinate in chr1 which + is not have overlapped with others . then Counting is became zero for the chr2 and check inside the chr2 co +rdinates for overlap and give ID from ID_1 for chr2 again and countin +g and so on for all chromosomes and give the out put like this : TSSD_ID chr start end strand count ID_1 1 101 111 - 3 ID_2 1 112 113 - 1 ID_3 1 113 115 - 1 ID_1 2 114 118 - 1 ID_2 2 119 123 - 2 ID_1 3 125 132 - 1
    this the output I want to get but Mine is not working well !

      I think you made a mistake with chromosom 3 in your example output

      Here is a working solution:

      #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @chrom; while (<>) { my ($chr, $start,$end,$c,$d,$strand)= split; push @chrom, {'chrom'=>$chr, 'start'=>$start, 'end'=>$end, 'strand +'=>$strand }; } my @chroms= sort { $a->{start} <=> $b->{start} } @chrom; #print Dumper(\@chroms); exit(0) if (@chroms==0); my $coord= shift @chroms; #use first coordinate as range counter my $overlap=1; my $id= 1; while( my $co= shift @chroms ) { if ($co->{chrom} ne $coord->{chrom}) { printresults($coord,$overlap,$id); $coord= $co; $overlap=1; $id=1; } else { if ($co->{start}>=$coord->{end}) { printresults($coord,$overlap,$id); $coord= $co; $overlap=1; $id++; } else { $overlap++; $coord->{end}= $co->{end}; } } } printresults($coord,$overlap,$id); #------------- sub printresults { my ($coord,$overlap,$id)= @_; print "ID_$id $coord->{chrom} $coord->{start} $coor +d->{end} $coord->{strand} $overlap\n"; }
      prints
      ID_1 chr1 101 111 - 3 ID_2 chr1 112 113 - 1 ID_3 chr1 113 115 - 1 ID_1 chr2 114 118 - 1 ID_2 chr2 119 123 - 2 ID_1 chr3 125 130 - 1 ID_2 chr3 131 132 - 1

      You can remove the '#' in front of the 'print Dumper' line if you want to see how the data in @chroms looks like

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://960990]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (20)
As of 2015-07-30 13:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (271 votes), past polls