Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Split a file based on column

by space_monk (Chaplain)
on Jan 17, 2013 at 11:04 UTC ( #1013758=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Split a file based on column
in thread Split a file based on column

You caught my comment whilst it was being drafted; I did state another reason for the approach I suggested.

Memory is almost never a problem nowadays unless you're running it on your 15 year old PC, but 300k rows * 64 k per row (19GB??) may give some pause for thought. Time to go shopping for more memory or increase your cache. :-)

A Monk aims to give answers to those who have none, and to learn from those who know more.


Comment on Re^3: Split a file based on column
Re^4: Split a file based on column
by davido (Archbishop) on Jan 17, 2013 at 18:56 UTC

    Loading a 19GB file into memory does indeed give pause for thought.... long long pause. :) Time enough to contemplate approaches that do scale well.

    Your accumulate and write when full strategy is a pretty good idea. It would be a data cache rather than a filehandle cache, and the implementation ought to be pretty straight forward. Implementing the file-handle LFU cache seems like it would be more fun though.


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1013758]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2014-09-23 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (241 votes), past polls