Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: Sorting Numbers & Text

by PriNet (Beadle)
on Jul 12, 2012 at 21:40 UTC ( #981508=note: print w/ replies, xml ) Need Help??


in reply to Re: Sorting Numbers & Text
in thread Sorting Numbers & Text

I start with a "new" value that isn't in the "list(file)" yet (either alpha or numeric), then start reading out the values one at a time from the sourcefile, compare the two, print the "smallest" back to a tempfile while retaining the largest for the next read... and so on... (kind-of-a-bubble-sort) then swap the files back at the end. its the "comparing" of mixed types that i get stuck on. and i'm worried about running out of memory if i just load into array and <=> sort function (which i may end up doing *heh*) but you have given me an idea about "flagging" the first character as alpha or numeric ... *hmmmm*


I did try re-inventing the wheel...
But it kept getting stuck on the corners


Comment on Re^2: Sorting Numbers & Text
Re^3: Sorting Numbers & Text
by johngg (Abbot) on Jul 12, 2012 at 22:25 UTC

    Another approach might be to read chunks of your very large database, perhaps 100k to 500k records at a time, and sort each chunk into its own temporary file. Once you have read and sorted all of the data, do a sort/merge of the temporary files into a final sorted file. My gut feeling is that this would be more efficient than the "two at a time" approach you are taking.

    Cheers,

    JohnGG

Re^3: Sorting Numbers & Text
by johngg (Abbot) on Jul 13, 2012 at 18:41 UTC

    Thinking about it further, read a chunk of your database then sort and print to two temporary files, one for letters, one for numbers. Then the sort/merge of the temporary files will be simpler keeping the two categories separate. Finally you can concatenate the letters and numbers merged files for your results file. In this code I am writing to in-memory scalars rather than disk files just to keep things tidy.

    knoppix@Microknoppix:~$ perl -Mstrict -Mwarnings -E ' > my @values = qw{ > 041351920234 > Rabbit > 0343120 > 041271024500 > 0430870 > Apple > 041460301399 > }; > > my $rsLets = do { \ my $lets }; > open my $letsFH, q{>}, $rsLets or die $!; > my $rsNums = do { \ my $nums }; > open my $numsFH, q{>}, $rsNums or die $!; > > say { $_->[ 1 ] ? $numsFH : $letsFH } $_->[ 0 ] for > sort { > ( $a->[ 1 ] <=> $b->[ 1 ] ) > || > ( > $a->[ 1 ] > ? $a->[ 0 ] <=> $b->[ 0 ] > : $a->[ 0 ] cmp $b->[ 0 ] > ) > } > map { [ $_, m{^\d} ? 1 : 0 ] } > @values; > > say ${ $rsLets }, q{-----------------}; > say ${ $rsNums }, q{-----------------};' Apple Rabbit ----------------- 0343120 0430870 041271024500 041351920234 041460301399 ----------------- knoppix@Microknoppix:~$

    I hope this is of interest.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://981508]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (12)
As of 2014-07-30 06:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls