Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re^2: Split a file based on column

by Anonymous Monk
on Jan 17, 2013 at 10:59 UTC ( #1013755=note: print w/replies, xml ) Need Help??

in reply to Re: Split a file based on column
in thread Split a file based on column

All of the above answers seem to have problems with possible filehandle limits;

Re: Split a file based on column doesn't , also doesn't suffer from load-file-into-ram

Replies are listed 'Best First'.
Re^3: Split a file based on column
by space_monk (Chaplain) on Jan 17, 2013 at 11:04 UTC

    You caught my comment whilst it was being drafted; I did state another reason for the approach I suggested.

    Memory is almost never a problem nowadays unless you're running it on your 15 year old PC, but 300k rows * 64 k per row (19GB??) may give some pause for thought. Time to go shopping for more memory or increase your cache. :-)

    A Monk aims to give answers to those who have none, and to learn from those who know more.

      Loading a 19GB file into memory does indeed give pause for thought.... long long pause. :) Time enough to contemplate approaches that do scale well.

      Your accumulate and write when full strategy is a pretty good idea. It would be a data cache rather than a filehandle cache, and the implementation ought to be pretty straight forward. Implementing the file-handle LFU cache seems like it would be more fun though.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1013755]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2018-06-18 20:12 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (110 votes). Check out past polls.