Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Counter - number of tags per interval

by BrowserUk (Pope)
on May 04, 2013 at 15:05 UTC ( #1032049=note: print w/ replies, xml ) Need Help??


in reply to Counter - number of tags per interval

50 distinct id's and each id in file.2 has approximately 5-7 * 10^9 lines.

File 2 has ~350 billion lines?

What is the maximum coordinate?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Counter - number of tags per interval
Re^2: Counter - number of tags per interval
by baxy77bax (Chaplain) on May 04, 2013 at 15:23 UTC
    yes that is true (why is this so odd ?? in bioinformatics you have 50-500gb read-mapps (files) and nobody finds that odd :) ). Maximum coordinate ?? well that depends on a simulation. but i would say for id = a, max is app. 5*10^9 i mean i need to find this out by going through the file.
      why is this so odd ?? i

      Not odd. Just important to know as it greatly affects the possible solutions.

      How long does/do you estimate it to take to process those two files with your current code?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        current estimates:

        One Id app. 8-12 h
        50 id's app. 500h = app. 20 days which is 20 times longer then the actual computation
        since there will be more of such simulations i think a better solution would be preferable. Unfortunately i cannot temper with the simulation part since i have no source , but even if i did i wouldn't do it since there are a lot of heuristics involved which i honestly don't understand.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1032049]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2014-07-26 14:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (178 votes), past polls