Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Finding Overlapping Regions on Genome

by mtmcc (Hermit)
on Jul 11, 2013 at 11:47 UTC ( #1043699=note: print w/ replies, xml ) Need Help??


in reply to Finding Overlapping Regions on Genome

If I understand what you want correctly, this (not very elegant) script should work (provided your values are in ascending order of contig start site (column 3)):

#!/usr/bin/perl use strict; use warnings; my $dataFile = $ARGV[0]; my $get = 1; my @array = (); my $col1 = 0; my $col2 = 0; my $low = 0; my $high = 0; open (FILE, "<", $dataFile); while (<FILE>) { next unless $_ =~ /[0-9]/; @array = split(" ", $_); if ($get == 1) { if (($array[2] > $high) && ($high > 0)) { print STDOUT "$col1\t$col2\t$low\t$high\n"; $low = $array[2]; $high = $array[3]; } $col1 = $array[0]; $col2 = $array[1]; $low = $array[2] if (($array[2] < $low) || ($low == 0) +); $high = $array[3] if (($array[3] > $high) || ($high == + 0));; $get = 0; } if ($array[2] <= $high) { $high = $array[3] if $array[3] > $high; } if (($array[2] > $high) || (eof(FILE))) { print STDOUT "$col1\t$col2\t$low\t$high\n"; $low = $array[2]; $high = $array[3]; $get = 1; } }

If the chromosome changes during your large file, it would need to be modified to account for that.

There's probably a much prettier way of doing it though...

Michael


Comment on Re: Finding Overlapping Regions on Genome
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1043699]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-08-22 06:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (148 votes), past polls