Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Re: Slow at sorting?

by dws (Chancellor)
on Nov 22, 2001 at 01:47 UTC ( [id://126866]=note: print w/replies, xml ) Need Help??


in reply to Re: Slow at sorting?
in thread Slow at sorting?

CD1\01100809.pdf(1) - [Account Number] Indexed key "654546654".

According to an earlier post, the bold fields are the ones you're sorting on. If that's truly the case, and if the text within brackets comes from a limited dictionary, then this sort can be done entirely with numbers.

First, combine the CD number with the file number, yielding (in this case)   101100809 Then map "Account Number" into its pre-determined sequence number. You're then left sorting something like   [ 101100809, 47, seek-address ] on the first two fields, which are now numbers.

Replies are listed 'Best First'.
Re: Slow at sorting?
by guha (Priest) on Nov 22, 2001 at 02:42 UTC
    How about extending it to something like

    pack("n N n", $1, $2, $3).$4

    thereby we would sort CD# > 9 correctly, decrease the memory footprint and allow for the fast intrinsic ASCIIbetical sort.
    Perhaps even to the point where fast hash lookups can be used.

    It's a pity the data is in practice unavailable

    Here's what I would like to time ...

    . . my (%sort_hash); my $offset = 0; while (<IN>) { if( m/^CD(\d+)\\(\d+)\.pdf\((\d+)\)\s-\s\[(.+?)\]/ ) { $sort_hash{pack("n N n", $1, $2, $3).$4} = $offset; } $offset = tell(IN); } foreach my $k (sort keys %sort_hash) { seek IN, $sort_hash{$k}, 0; print OUT scalar(<IN>); }
      And then, why not pack the offset on the end, and unpack substr -4 when you're done. Then you're down to a simple array sort and the only problem is the original one: sooner or later you scale up till you're paging forever.

        p
        True, but I'm not certain that the hash is slower only that it requires more memory, however that has to be tested, IMHO.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://126866]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-19 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found