Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Sorting question

by dws (Chancellor)
on May 14, 2006 at 16:27 UTC ( #549333=note: print w/replies, xml ) Need Help??

in reply to Sorting question

What you may need here is a "Schwartzian Transform", so that you can sort based on a transformed key, where the transform is to expand any group of digits into a wider, fixed-sized group. In this way, "K-2-D-1A" gets a sort key of "K-002-D-001A", and "K-2-D-10A" gets a sort key of "K-002-D-010A". Since all sequences of numbers are now equally wide, you'll get the sort order you desire. But since the Schwartzian Transform holds on to unmodified data, you'll have the untransformed keys to display.

Here's an (untested) starting point:

my @data = qw(K-2-D-1A K-2-D-2A K-2-D-10A); my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, normalize_digits($_) ] } @data; sub normalize_digits { my ($key) = @_; $key =~ s/(\d+)/sprintf("%03d", $1)/eg; return $key; # thanks, ikegami }
This builds a composite data structure with the original key and a "normalized" key, sorts the structure based on the normalized key, and then discards the normalized key.

Update: Or, better yet, don't reinvent the wheel, and use Sort::Naturally (which I'd somehow overlooked).

Replies are listed 'Best First'.
Re^2: Sorting question
by ikegami (Pope) on May 14, 2006 at 17:09 UTC

    Your code doesn't work because you forgot return $key; in normalize_digits.

    Also, I think you can speed things up by using the following:

    my @sorted = map { local $_ = $_; s/^.*\0//; $_ } sort map { normalize_digits($_) . "\0$_" } @data;
Re^2: Sorting question (slow)
by tye (Sage) on May 14, 2006 at 16:58 UTC

    Note that your quite simple, dozen-line reinvention is going to be a ton faster than the method used by Sort::Naturally (you can probably make it 2- to 4-times faster still and more memory efficient by using a smarter sorting technique as well, such as my favorite or some found by natural sort). The file size might not make even these quite small increases in complexity in the code worthwhile, of course.

    - tye        

Re^2: Sorting question
by Scott7477 (Chaplain) on May 15, 2006 at 22:38 UTC
    Your explanation of the "Schwartzian Transform" is the easiest one to understand that I've seen so far. Nice++

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://549333]
[Corion]: Meh. SQL window functions would even make pagination easy/trivial (but not performant), as rank() over (partition by user order by timestamp) / 10 as page would give me a page number for each item, with 10 items per page.
[Corion]: Of course, the query performance for "all items on page 10" is likely worse than rank() between 100 and 109 , but if that means I can write 15 lines of SQL instead of needing to think about how to partition things and how to encode the page size...
[Corion]: ... that would be nice. But alas, I'm currently tied to SQLite as minimum implementation, and it doesn't implement window functions :-(
[Corion]: And I'm not aware of any other serverless SQL implementation that even reaches the capability of SQLite, not to mention surpasses it

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2018-03-22 12:17 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (274 votes). Check out past polls.