Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: How to sort a large flat file 90mb with PERL-- various ways/tradeoffs/watchouts

by jarich (Curate)
on Jun 30, 2004 at 09:47 UTC ( #370718=note: print w/replies, xml ) Need Help??


in reply to How to sort a large flat file 90mb with PERL-- various ways/tradeoffs/watchouts

First of all, welcome to Perl maxon11.

You would be vastly assisted by reading the various tutorials on this site, and getting yourself a good book about Perl. Randal Schwartz's "Learning Perl" might be perfect.

I particularly recommend that you read the On asking for help and How (Not) To Ask A Question nodes as I feel that your questions here could have benefitted by your being more brief. At the very least, you can probably depend on us having access to the sort documentation already. ;) Even if Randal did sent it to you.

The uninitialized value warning from sort.pudge.pl is probably a minor bug. If that's all it gave you however, then I wouldn't worry too much. It looks like it otherwise sorted your file. In the second case - yes you probably ran out of memory. How much memory are your programs allowed to take up? If you're on a Unix-like operating system then you can usually type in "ulimit" on the command line and find out.

Mind you, if you're using a Unix-like operating system then you should probably use the unix sort. :)

The difference between the two sort code listings that you provide is that the first makes several copies of the data in memory whereas the second does not.

To explain how the second program works, I'll reformat it and add in some comments. I've also made some slight changes to make it a better program generally.

#!/usr/bin/perl my $input = "H2Z_ZDL0.000"; my $ouput = "sorted.txt"; # Open $input for reading. open(ORIGFILE, "<", $input) or die "Could not open $input: $!"; # Open $output for writing, destroying current file contents open (FINALFILE, ">", $output) or die "Could not open $output: $!"; # This line does several things. It reads all the lines # from ORIGFILE into memory (which is done in the <ORIGFILE> # bit), sorts them (using sort) and then prints them out # to the file in FINALFILE. print FINALFILE sort(<ORIGFILE>); # close file in FINALFILE, flushes buffer close (FINALFILE); # close file in ORIGFILE close (ORIGFILE);

You ask how Perl knows to default sort the whole record in alphabetical order. This is answered right up the top of the sort documentation:

If SUBNAME or BLOCK is omitted, "sort"s in standard string comparison order.

That is, if you write sort @array then sort will sort alphabetically.

I would presume that you actually want it to sort numerically. You can do this by writing: sort { $a <=> $b} @array just like it says in the documentation.

Good luck with learing Perl.

Hope this helps

jarich

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://370718]
help
Chatterbox?
[Corion]: There, there. It'll all be better.
[marto]: usemodperl "so to answer my question, no http at cpan," doesn't tie up with what you said "I can find cpan mirrors on http"
[usemodperl]: yea but http only has tar.gz, i wanna download modules with core perl, but http seems to make it impossible, that's my only question, how to find http mirrors like meta, or how to do it with core perl, but options now seem totally broken (on purpose :-(
marto wanders off
[usemodperl]: Corion it's really not misguided, it's the only way, to do something... wonderful IMHO
[Corion]: usemodperl: Why don't you set up your own (http-only) CPAN mirror? Or just fatpack your scripts? I wonder what problem you're trying to solve here.
[usemodperl]: don't worry about that, it's really cool, i promise!
[Corion]: usemodperl: Well, if the world changes and makes your "wonderful" approach not work anymore, you can either change your approach, or change the world. You seem to want to change others instead.
[usemodperl]: no dude i'm just asking how to access meta via http and you are getting defensive for some reason (did you turn off http at meta? :-)

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2018-06-24 16:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?



    Results (126 votes). Check out past polls.

    Notices?