Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: read and sort multiple files (wheel reuse)

by matrixmadhan (Beadle)
on Dec 01, 2008 at 06:35 UTC ( [id://727020]=note: print w/replies, xml ) Need Help??


in reply to Re: read and sort multiple files (wheel reuse)
in thread read and sort multiple files

Since its mentioned as too many large files, its better to use the -T option to specify a temp directory instead of filling the default /tmp directory that sort uses to store "temp" files for sorting

Replies are listed 'Best First'.
Re^3: read and sort multiple files (wheel reuse)
by spmlingam (Scribe) on Dec 01, 2008 at 07:41 UTC
    You can specify, which sort method to use for sorting with perl sort function.
    Please go through the following link to get to know:
    http://search.cpan.org/~tty/kurila-1.14_0/lib/sort.pm

    If you are running in Unix or Linux like operating system, you can use shell command "sort".
    You can use Tie::File module, which will not use much memory, but this will slow down the sorting process.

      That's a link to sort.pm in the kurila distribution. You want the in the perl distribution: sort

      You can specify, which sort method to use for sorting with perl sort function.

      The point isn't to use merge sort. The point is to sort files, which merge sort can do. use sort can't help with that.

      You can use Tie::File module

      I don't think sort can sort tied arrays in-place, so sort would cause the entire file to be loaded into memory.

      Even if sort sorts tied arrays in-place, the performance would be abysmal. We're talking about writing out (tens of? hundred of?) thousands of lines O(N log N) times.

      Finally, it would use more than "not much" memory. Tie::File keeps an index of every line it has encountered in the file. That usually means, as it does in this case, that it keeps an index of every line in the file. This is in addition to the cache. That's a significant amount of memory in this case, but it sounds like it could be acceptable.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://727020]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-03-28 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found