in reply to Counter - number of tags per interval
Hi,
I thought again about your problem while I was doing something else and it suddenly came to my mind that it might be much faster if you did it just completely the other way around (depending naturally on your actual data).
Your program takes a lot of time because it has to loop over a large data structure for every single line of your very big file. What you could do instead is to start by reading the big file and summarize its content. My understanding is that you have a lot of repetitive information that you want to count. You could just parse the big file and record into an array or a hash the number of times each value comes up. This means you no longer have to do lengthy loops for each line of the big file
Once you've read the big file (or possibly just one ID section of it, say the 'a' section), you can use the array or hash summary against the intervals of the other file. This way, the nested loops of your program run against a presumably much smaller summary of the data and it might be much faster.
Of course, I have made some assumptions on the data which may turn out to be wrong. We have no idea of what your data looks like, which makes it very difficult for us to help you or even offer useful advice. In particular, I have assumed that the data summary could hold in an array or a hash, this may not be possible, it really depends on how repetitive your data is.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Counter - number of tags per interval
by sundialsvc4 (Abbot) on May 04, 2013 at 20:31 UTC |