Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: Split a file based on column

by space_monk (Chaplain)
on Jan 17, 2013 at 11:04 UTC ( #1013758=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Split a file based on column
in thread Split a file based on column

You caught my comment whilst it was being drafted; I did state another reason for the approach I suggested.

Memory is almost never a problem nowadays unless you're running it on your 15 year old PC, but 300k rows * 64 k per row (19GB??) may give some pause for thought. Time to go shopping for more memory or increase your cache. :-)

A Monk aims to give answers to those who have none, and to learn from those who know more.


Comment on Re^3: Split a file based on column
Re^4: Split a file based on column
by davido (Archbishop) on Jan 17, 2013 at 18:56 UTC

    Loading a 19GB file into memory does indeed give pause for thought.... long long pause. :) Time enough to contemplate approaches that do scale well.

    Your accumulate and write when full strategy is a pretty good idea. It would be a data cache rather than a filehandle cache, and the implementation ought to be pretty straight forward. Implementing the file-handle LFU cache seems like it would be more fun though.


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1013758]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2015-07-06 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (70 votes), past polls