Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Counter - number of tags per interval

by BrowserUk (Pope)
on May 04, 2013 at 15:05 UTC ( #1032049=note: print w/ replies, xml ) Need Help??


in reply to Counter - number of tags per interval

50 distinct id's and each id in file.2 has approximately 5-7 * 10^9 lines.

File 2 has ~350 billion lines?

What is the maximum coordinate?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re: Counter - number of tags per interval
Re^2: Counter - number of tags per interval
by baxy77bax (Chaplain) on May 04, 2013 at 15:23 UTC
    yes that is true (why is this so odd ?? in bioinformatics you have 50-500gb read-mapps (files) and nobody finds that odd :) ). Maximum coordinate ?? well that depends on a simulation. but i would say for id = a, max is app. 5*10^9 i mean i need to find this out by going through the file.
      why is this so odd ?? i

      Not odd. Just important to know as it greatly affects the possible solutions.

      How long does/do you estimate it to take to process those two files with your current code?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        current estimates:

        One Id app. 8-12 h
        50 id's app. 500h = app. 20 days which is 20 times longer then the actual computation
        since there will be more of such simulations i think a better solution would be preferable. Unfortunately i cannot temper with the simulation part since i have no source , but even if i did i wouldn't do it since there are a lot of heuristics involved which i honestly don't understand.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1032049]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2015-07-02 22:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (46 votes), past polls