Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Finding Overlapping Regions on Genome

by mtmcc (Hermit)
on Jul 11, 2013 at 11:47 UTC ( #1043699=note: print w/ replies, xml ) Need Help??


in reply to Finding Overlapping Regions on Genome

If I understand what you want correctly, this (not very elegant) script should work (provided your values are in ascending order of contig start site (column 3)):

#!/usr/bin/perl use strict; use warnings; my $dataFile = $ARGV[0]; my $get = 1; my @array = (); my $col1 = 0; my $col2 = 0; my $low = 0; my $high = 0; open (FILE, "<", $dataFile); while (<FILE>) { next unless $_ =~ /[0-9]/; @array = split(" ", $_); if ($get == 1) { if (($array[2] > $high) && ($high > 0)) { print STDOUT "$col1\t$col2\t$low\t$high\n"; $low = $array[2]; $high = $array[3]; } $col1 = $array[0]; $col2 = $array[1]; $low = $array[2] if (($array[2] < $low) || ($low == 0) +); $high = $array[3] if (($array[3] > $high) || ($high == + 0));; $get = 0; } if ($array[2] <= $high) { $high = $array[3] if $array[3] > $high; } if (($array[2] > $high) || (eof(FILE))) { print STDOUT "$col1\t$col2\t$low\t$high\n"; $low = $array[2]; $high = $array[3]; $get = 1; } }

If the chromosome changes during your large file, it would need to be modified to account for that.

There's probably a much prettier way of doing it though...

Michael


Comment on Re: Finding Overlapping Regions on Genome
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1043699]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2015-07-28 22:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls