Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Loop through file to create interval tree

by pinha (Initiate)
on May 23, 2014 at 14:31 UTC ( #1087218=perlquestion: print w/replies, xml ) Need Help??
pinha has asked for the wisdom of the Perl Monks concerning the following question:


I have a series of tables with the following interval information:

start end ID 36701 40200 1 37901 39700 2 36701 39700 3
I want to find overlaps between the intervals of the IDs. Do different IDs have overlapping intervals? If so, what is the overlap and where?

I realise that the best way to approach my problem is to use an interval tree. I have been trying to use the module Set::IntervalTree but I am stuck.

Basically I am not sure how to loop through my file columns to fill the interval tree.

This is what I have so far:

#!/usr/local/bin/perl use strict; use warnings; use Set::IntervalTree; use Data::Dumper; #get the scaffold file name from user input (@ARGV) and stores in $fil +e #opens the scaffold file so that it can be used to fill the empty inte +rval tree my $file = shift; open my $fh, '<', $file or die "Cannot open $file: $!"; #create an empty interval tree my $tree = Set::IntervalTree -> new(); #loop to the file, read each line and add objects to the empty interva +l tree #there will be as many objects in the interval tree as there are hits +for the specific file my %overlap_table; while (my $line=<$fh>){ #while there are lines my @low = split("\t", $line); #get the <code> value from the 1 +st column (start position = low BT) print "$low[0]\n"; $overlap_table{$low[0]}++; my @high = split("\t", $line); #get the value from the 2nd co +lumn (end position = high BT) print "$high[1]\n"; $overlap_table{$high[1]}++; my @ID = split ("", $line); #ID information is the "value" print "$ID[2]\n"; $overlap_table{$ID[2]}++; } close($fh); print Dumper \%overlap_table;
I now want to use each one of the $low, $high and $ID to fill in the tree.

I would like to be able to loop through the file automatically.

I have lots of files with lots of lines and so entering the values manually is not a good option

To sum up, I would love if someone could help me understand how I loop through the lines of my $fh so that I can fill in the interval tree according to the following requirement:

 $tree->insert($ID, $low, $high) for each one of the lines.

Thank you so much in advance!

Replies are listed 'Best First'.
Re: Loop through file to create interval tree
by Laurent_R (Canon) on May 23, 2014 at 18:07 UTC
    Please fix your formatting to make your code readable. Put one opening <code> at the beginning of the code, one closing </code> tag at the end of your code, and no <code> tag in the middle of your comments.

    Otherwise, your splitting is very inefficient. You could get your 3 variables in just one operation:

    while (my $line=<$fh>){ chomp $line; my ($low, $high, $id) = split /\t/, $line; # ...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1087218]
Approved by taint
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2018-01-20 04:08 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (226 votes). Check out past polls.