Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Counter - number of tags per interval

by BrowserUk (Pope)
on May 04, 2013 at 15:05 UTC ( #1032049=note: print w/replies, xml ) Need Help??


in reply to Counter - number of tags per interval

50 distinct id's and each id in file.2 has approximately 5-7 * 10^9 lines.

File 2 has ~350 billion lines?

What is the maximum coordinate?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: Counter - number of tags per interval

Replies are listed 'Best First'.
Re^2: Counter - number of tags per interval
by baxy77bax (Chaplain) on May 04, 2013 at 15:23 UTC
    yes that is true (why is this so odd ?? in bioinformatics you have 50-500gb read-mapps (files) and nobody finds that odd :) ). Maximum coordinate ?? well that depends on a simulation. but i would say for id = a, max is app. 5*10^9 i mean i need to find this out by going through the file.
      why is this so odd ?? i

      Not odd. Just important to know as it greatly affects the possible solutions.

      How long does/do you estimate it to take to process those two files with your current code?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        current estimates:

        One Id app. 8-12 h
        50 id's app. 500h = app. 20 days which is 20 times longer then the actual computation
        since there will be more of such simulations i think a better solution would be preferable. Unfortunately i cannot temper with the simulation part since i have no source , but even if i did i wouldn't do it since there are a lot of heuristics involved which i honestly don't understand.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1032049]
help
Chatterbox?
[Corion]: marto: How's things? I hope the kids are fine and you too!
[Corion]: Oh yay. I wonder why a very simple change in a program doesn't elicit a diff, and now I see that my diff program seems to have a bug ;)
[1nickt]: marto thanks for asking, so far so good. A pretty modern stack and decent procedures, although rather too much home-built stuff (e.g. a logging role that should tries to duplicate Log::Any).
[Corion]: No. It's just that I'm comparing the same output file twice, instead of comparing the output files of the two runs %-)

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2017-12-11 11:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (289 votes). Check out past polls.

    Notices?