Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Loop through file to create interval tree

by pinha (Initiate)
on May 23, 2014 at 14:31 UTC ( #1087218=perlquestion: print w/replies, xml ) Need Help??
pinha has asked for the wisdom of the Perl Monks concerning the following question:


I have a series of tables with the following interval information:

start end ID 36701 40200 1 37901 39700 2 36701 39700 3
I want to find overlaps between the intervals of the IDs. Do different IDs have overlapping intervals? If so, what is the overlap and where?

I realise that the best way to approach my problem is to use an interval tree. I have been trying to use the module Set::IntervalTree but I am stuck.

Basically I am not sure how to loop through my file columns to fill the interval tree.

This is what I have so far:

#!/usr/local/bin/perl use strict; use warnings; use Set::IntervalTree; use Data::Dumper; #get the scaffold file name from user input (@ARGV) and stores in $fil +e #opens the scaffold file so that it can be used to fill the empty inte +rval tree my $file = shift; open my $fh, '<', $file or die "Cannot open $file: $!"; #create an empty interval tree my $tree = Set::IntervalTree -> new(); #loop to the file, read each line and add objects to the empty interva +l tree #there will be as many objects in the interval tree as there are hits +for the specific file my %overlap_table; while (my $line=<$fh>){ #while there are lines my @low = split("\t", $line); #get the <code> value from the 1 +st column (start position = low BT) print "$low[0]\n"; $overlap_table{$low[0]}++; my @high = split("\t", $line); #get the value from the 2nd co +lumn (end position = high BT) print "$high[1]\n"; $overlap_table{$high[1]}++; my @ID = split ("", $line); #ID information is the "value" print "$ID[2]\n"; $overlap_table{$ID[2]}++; } close($fh); print Dumper \%overlap_table;
I now want to use each one of the $low, $high and $ID to fill in the tree.

I would like to be able to loop through the file automatically.

I have lots of files with lots of lines and so entering the values manually is not a good option

To sum up, I would love if someone could help me understand how I loop through the lines of my $fh so that I can fill in the interval tree according to the following requirement:

 $tree->insert($ID, $low, $high) for each one of the lines.

Thank you so much in advance!

Replies are listed 'Best First'.
Re: Loop through file to create interval tree
by Laurent_R (Canon) on May 23, 2014 at 18:07 UTC
    Please fix your formatting to make your code readable. Put one opening <code> at the beginning of the code, one closing </code> tag at the end of your code, and no <code> tag in the middle of your comments.

    Otherwise, your splitting is very inefficient. You could get your 3 variables in just one operation:

    while (my $line=<$fh>){ chomp $line; my ($low, $high, $id) = split /\t/, $line; # ...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1087218]
Approved by taint
[Tanktalus]: curious, then, why the reader works :)
[Tanktalus]: or posting the last hour of cb... well, maybe I should double check that first :)
[Tanktalus]: yup, it's working. :)
[choroba]: https://github. com/choroba/pm-cb/ commit/7b57f513596 7bf8a29d74f1c307de 9a76894cdcf
[choroba]: Also, a thread here on PM mentioned that one of or should now work
[choroba]: Tidings
[Tanktalus]: So, I can log in, I can update last hour of cb, I can read the cb, I just can't post a message to it :(

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2018-07-15 21:30 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (326 votes). Check out past polls.