Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

(Golf) Keysort

by demerphq (Chancellor)
on Feb 14, 2002 at 14:20 UTC ( [id://145450]=perlmeditation: print w/replies, xml ) Need Help??

Originally this came up when I was trying to figure out a useful generic sort order for the keys of a hash. I wanted it so that if the keys were numeric they would get sorted numerically, and if the keys were alphanumeric they would get lexicographical ordering. Then the thought came up what happens if there are both? Answer: Put the numbers at the top and the words at the bottom. But what happens if we have keys like 1 1a 2 2a 2b? Well they should be sorted into the numeric portion....

RESULTS SO FAR
MonkLexicographicalDictionary
japhy106-
demerphq185195

So here is the specification for the golf:

Write a subroutine that takes a list of keyvalues, sorts them according to the below specification and returns the sorted list.

  • Any key that is also an integer (ignoring leading spaces) should be placed in numeric order at the begining of the list. This means that the values must match  /^\s*(?0|-?[1-9]\d{0,8})$/. Note that this includes negative numbers.
  • Any key that begins with an integer as defined by the previous rule should be sorted into the correct numerical order, then by the alphanumeric portion.
  • Any key that does not satisfy the above rules should be placed into lexicographical order at the end of the list.
  • The solution must run under warnings and strictures without generating any messages or errors.
  • UPDATE: japhy raised an interesting question, concerning whether he can put a newline where perl syntax says whitespace is mandatory and thus not have the space count. After clarification from Masem (see below) it seems this is illegal. As he points out it should possible to execute the code after doing an s/\n//g.

An example input (in csv) is

1a, 2a, a10, 1, 0, 0alpha, 2, AbBa, 10, -13bravo, 42, 13, adam1, 20b, abc, a-BC, 18, 92, A-b-c, Arthur, Bravo, 25, 11a, 12a, ab-ba

Which should produce the output list

-13bravo, 0, 0alpha, 1, 1a, 2, 2a, 10, 11a, 12a, 13, 18, 20b, 25, 42, 92, A-b-c, AbBa, Arthur, Bravo, a-BC, a10, ab-ba, abc, adam1

There is no requirement to print out the output, only to return it in the correct order.

A slight twist on this problem is to do the same thing above, but substituting a dictionary sort order for the lexicographical sortorder. (Dictionary sort order is where non alphabetic chars are removed and case is irrelevant.) In this case the output for the above input would be

-13bravo, 0, 0alpha, 1, 1a, 2, 2a, 10, 11a, 12a, 13, 18, 20b, 25, 42, 92, a10, AbBa, ab-ba, abc, a-BC, A-b-c, adam1, Arthur, Bravo

As I'm interested in effiency as well as code size I will rate the entries in both ways, although you can submit different subroutines for both. Of course if the smallest is also the fastest then it wins overall...

Anyway, I hope this is interesting to you all, it certainly was fun working on it for me.

Oh, and heres my attempts, but I'm sure they won't be winners.

sub demerphq_lexi { # 1 1 2 3 4 5 6 +7 8 8 #123456789012345678901234567890123456789012345678901234567890123456789 +01234567890123456 map{$$_[2]}sort{defined$$a[0]&&defined$$b[0]?$$a[0]-$$b[0]||$$a[1]cmp +$$b[1]:!$$a[0]&&! $$b[0]?$$a[1]cmp$$b[1]:!$$a[0]&&$$b[0]?1:-1}map{/^\s*(0|-?[1-9]\d{0,8 +})?(\D.*)?/s;[$1, $2||"",$_]}@_ # 86*2+13=185 } sub demerphq_dict { # 1 2 3 44 5 6 +7 7 #123456789012345678901234567890123456789012345678901234567890123456789 +01234567 map{$$_[2]}sort{defined$$a[0]&&defined$$b[0]?$$a[0]-$$b[0]||$$a[1]cmp +$$b[1]:! $$a[0]&&!$$b[0]?$$a[1]cmp$$b[1]:!$$a[0]&&$$b[0]?1:-1}map{($b=lc)=~s/[ +^a-z]//g ;[/^\s*(0|-?[1-9]\d{0,8})?/,$b||"",$_]}@_ # 77*2+41=195 }
So my lexicographical solution is 185 chars, and my dictionary solution is 195...

Now lets see how many chars I wasted... ;-)

PS Im really interested to see if someone can come up with a GRT version of the sort, so bonus points if you can do it with a GRT instead of an ST, and even more bonus points for doing it without either...

Yves / DeMerphq
--
When to use Prototypes?

Replies are listed 'Best First'.
Re: (Golf) Keysort
by japhy (Canon) on Feb 14, 2002 at 15:13 UTC
    Update 2: demerphq reminded me that his code returns the list. My code has been updated to do so, and comes in at 106:
    # this is LEXICAL sorting #2345678901234567890123456789012345678901234567890123 sub JAPHY { sort{(*-,*+)=map[/^\s*(0|-?[1-9]\d{0,8})(.*)$/],$a,$b ;@-?@+?$-[0]-$+[0]||$-[1]cmp$+[1]:-1:@+?1:$a cmp$b}@_ }

    Update: new code under 100, at 98... oh, notice how I'm leaving that space in between $a and cmp? If it's legal for me to put a newline there and not have it count, let me know.
    #234567890123456789012345678901234567890123456789 sub JAPHY { (*-,*+)=map[/^\s*(0|-?[1-9]\d{0,8})(.*)$/],$a,$b; @-?@+?$-[0]-$+[0]||$-[1]cmp$+[1]:-1:@+?1:$a cmp$b }

    Here's my sorting function, with a healthy count of 107:
    #2345678901234567890123456789012345678901234567890123456 sub JAPHY { my($A,$B)=map[/^\s*(0|-?[1-9]\d{0,8})(.*)$/],$a,$b; @$A?@$B?$$A[0]-$$B[0]||$$A[1]cmp$$B[1]:-1:@$B?1:$a cmp$b }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who could use a job
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

(tye)Re: (Golf) Keysort
by tye (Sage) on Feb 14, 2002 at 17:09 UTC

    Not golfed. Just wanted to present a GRT:

    #!/usr/bin/perl -w use strict; print join ", ", grep s{\0(=|(.)(....))}{ "\0" eq $1 ? $1 : "-" eq $2 ? -1-unpack("N",$3) : unpack("N",$3) }ges||1, sort grep s{(-?\d+|\0)}{ "\0" . ( "\0" eq "=" ? $1 : $1 < 0 ? "-".pack("N",-1-$1) : ":".pack"N",$1 )}ge||1, @{[qw( 1a 2a a10 1 0 0alpha 2 AbBa 10 -13bravo 42 13 adam1 20b abc a-BC 18 92 A-b-c Arthur Bravo 25 11a 12a ab-ba )]};
    (updated.)

            - tye (but my friends call me "Tye")
Re: (Golf) Keysort
by Masem (Monsignor) on Feb 14, 2002 at 17:44 UTC
    Typically, IMO with golf problems, solutions should be considered as being one one line of code (no CRs). CRs that are added when solutions are presented should only be used to improve the presentation of the code, but should not be used to lose a stroke. Thus, japhy's suggestion of adding a CR between a variable and a following function name is violating the rules (as this trick can help reduce the number of strokes in previous golf challenges). In other words, the code should work if one was to do s/\n\\g; on the golf code.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

      Ok, I didnt realize this was the convention.. Since japhy didnt take advantage of it I'll remove my update...

      Thanx.

      Yves / DeMerphq
      --
      When to use Prototypes?

Re: (Golf) Keysort
by chipmunk (Parson) on Feb 18, 2002 at 05:51 UTC
    Here's my lexicographic solution, which comes in at 89 characters (including the call to sort):
    sub chipmunk_lexi { sort{(($B)=$b=~($r='^\s*(0|-?[1-9]\d{0,8})(?!\d)'))<=>(($A)=$a=~$r)||$ +A<=>$B||$a cmp$b}@_ }
    If I've understood the hole properly, the definitions of "integer" and "starting with an integer" are a bit tricky. '0' is an integer, but '00' isn't; '0a' starts with an integer, but '00a' doesn't. Thus, '0', '1', '00', 'adam' would be sorted in that order. Have I got that right?
      Nice!

      And you are correct with how im defining an integer.

      However you are correct that there is a minor problem with the regex. The ignore leading whitespace shouldnt happen, as it prevents reconstructive methods from being successful. (Although ST style stuff should be fine with it...)

      So the regex should be

      /^(0|-?[1-9]\d{0,8})/
      Your (?!\d) is a worthy and intelligent addition, but I was willing to overlook the issue it resolves. ;-)

      BTW, my reasoning for disallowing 0000 and the like was to ensure that binary hash keys were sorted lexicographically.

      Im going to update the original node with your score and the bit about the regex tomorrow.

      Yves / DeMerphq
      --
      When to use Prototypes?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://145450]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-03-19 02:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found