Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re^4: Split a file based on column

by roboticus (Chancellor)
on Jan 17, 2013 at 11:35 UTC ( #1013762=note: print w/ replies, xml ) Need Help??

in reply to Re^3: Split a file based on column
in thread Split a file based on column


I've used a priority queue in a C program a dozen or so years ago, and it worked well. As far as the overhead goes, I wouldn't expect it to be prohibitive, especially when compared to the time savings of opening a file.

Part of the reason I chose an LRU cache for this one is that I've found they work pretty well for the types of applications I use--at least when the number of file handles is more reasonable. Most of the data I play with tends to be 'clumped' in that similar records tend to be closer together. For example, when I process some credit card data, I'll have long runs of Visa transactions, somewhat shorter runs of MasterCard transactions, while others (American Express, Discover) are frequently very short runs.


When your only tool is a hammer, all problems look like your thumb.

Comment on Re^4: Split a file based on column

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1013762]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2015-10-13 10:09 GMT
Find Nodes?
    Voting Booth?

    Does Humor Belong in Programming?

    Results (299 votes), past polls