Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Counter - number of tags per interval

by roboticus (Chancellor)
on May 04, 2013 at 23:31 UTC ( #1032092=note: print w/replies, xml ) Need Help??

in reply to Counter - number of tags per interval


It took a while to figure out what your algorithm was, but I finally doped it out. It seemed overly complex for what you're doing, though. So I coded it up in a form that's a little easier to understand--at least for me! I don't know if it's any faster than yours, or not, so give it a try and let me know how it looks.

Basically, I just keep a list of active intervals, and increment all of them when I see a '1'. When I hit a column that begins a new interval, I move it from the "waiting" list and into the "current" list. When I hit an end column, I remove all expired lists from the active list. That's pretty much it.

Please let me know if it's any better for you...

$ cat #!/usr/bin/perl use warnings; use strict; use Getopt::Long; my ($i1, $i2, %hash1); GetOptions ('i=s' => \$i1, 't=s' => \$i2); open (IN, "<", $i1)|| die "$!"; open (IN1, "<", $i2)|| die "$!"; # Build intervals for each ID while (<IN1>){ chomp; my ($id, $beg, $end) = split /\s+/,$_; push @{$hash1{$id}}, [ $beg, $end, 0 ]; } close IN1; my $p =0; my $id = ""; my @add_intervals; my @del_intervals; my @cur_intervals; #count tags per interval while(<IN>){ s/\s+$//; #print "$.: $_\n"; if (/#(.*)/){ $p=0; $id = $1; # Intervals for this ID @add_intervals = sort {$a->[0] <=> $b->[0]} @{$hash1{$id}}; # List of interval ends my %uniq = map { $_->[1], 0 } @add_intervals; @del_intervals = sort { $a <=> $b } keys %uniq; # Start with no active intervals @cur_intervals = ( ); next; } $p++; # sometimes the first column is set to 0 /^(\d+)\s+(\d+)/ or next; # add new intervals that start on this column while (@add_intervals and ($p == $add_intervals[0][0])) { #print "Adding interval $id:$add_intervals[0][0] .. $add_inter +vals[0][1]\n"; push @cur_intervals, shift @add_intervals; } # Increment all active intervals when we find a hit if (@cur_intervals and $2 eq '1') { ++$_->[2] for @cur_intervals; } # remove ranges that end on this column if (@del_intervals and $p==$del_intervals[0]) { #print "Deleting intervals ending on column $p\n"; shift @del_intervals; @cur_intervals = grep { $_->[1] > $p } @cur_intervals; } } close IN; for my $id (sort keys %hash1) { for my $interval (@{$hash1{$id}}) { print "$id ($interval->[0] .. $interval->[1]) : $interval->[2] +\n"; } } $ perl -i=1032018.file_2 -t=1032018.file_1 a (12 .. 15) : 2 a (12 .. 17) : 4 a (13 .. 14) : 1 a (14 .. 19) : 5 b (10 .. 15) : 5 b (12 .. 15) : 4

Note: I think you might mean to use the first capture in the last regex as your column ($p), but I didn't change it.


When your only tool is a hammer, all problems look like your thumb.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1032092]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2017-10-22 15:56 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (273 votes). Check out past polls.