Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

One for the weekend: challenge

by BrowserUk (Patriarch)
on May 30, 2008 at 21:58 UTC ( [id://689350]=perlmeditation: print w/replies, xml ) Need Help??

I recently came across a task that originally pitted Java programmers against Lisp programmers, in support of the premise that, for a given task, Lisp programmers produce more concise (less bugs) solutions more quickly than their Java couterparts. I'm not concerned with discussing the premise, nor the validity of the test methods which were all done to death on slash-dot a couple of years ago.

What does interest me is seeing what Perl programmers make of the task. It isn't possible to assess the time taken to produce results in this environment, so the assesment process will be:

  • Accuracy of results.

    If you do not match the sample output, you lose.

  • No of lines of executable code.

    Counted manually. Putting three statement for line won't reduce your score. This isn't golf!

  • Time taken to process the sample dataset (on my machine).

    The results must be produced by processing the file. (Ie. Pre-processing the input and just producing the output will be cause for disqualification!).

  • A 'maintainablility quotient'.

    A tough one this. I'm not really sure of a good way to measure this?

    My thought is that I will produce an assessment, with justifications for each response awarding (negative) scores for each point if unmaintainability. Then, if there is enough interest, we can discuss it. (Someone else will have to assess my entry: Volunteers?

The scores will be the product of the above three values (lines * seconds * maintainability-demerits), with the lowest score being the winner.

Assessment will begin on Wednesday. (Arbitrary cut-off date that could be extended if need be.)

The challenge

All the information required to complete the challenge is detailed here in the first four bullet points.

Have fun!


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: One for the weekend: challenge
by Anonymous Monk on Jun 01, 2008 at 15:27 UTC
    This took about two hours to write, through three iterations to get reasonable performance (under one second on a Sempron 3000). I'm sure I'll have issues with the "maintainability" score :)
    #!perl use strict; my %dict; @ARGV = qw( d.dict d.input ); while(<>) # read in both files, load dict on first, match on second { chomp; (my $d = lc) =~ tr|a-z"/-|57630499617851881234762239|d; $ARGV eq 'd.dict' ? push @{$dict{$d}}, $_ : match($d, 0, "$_:"); } sub match # (to match, last was digit, matches) { my ($in, $digit, @have, $cnt) = @_; $in eq '' and return print "@have\n"; # have full match for my $k (map { substr $in, 0, $_ } 2..length $in) { match(substr($in, length $k), !++$cnt, @have, $_) for @{$dict{$k}} +; } $cnt or $digit or match(substr($in, 1), 1, @have, substr $in, 0, 1); }

      Amazing! It runs very fast! Here it runs for 0,6 seconds on the standard set of 1000 numbers (compared to 2.2 sec my code). With growing the set of numbers they are comparable in speed - for 50000 numbers they are equal 13.7 seconds. It seems that I am hurt by the regex compile time.

      I think your code is better than mine because :
      1. you do not relay on some experimental features and recent optimizations;
      2. The difference is in how we get the possible correct continuations of the number. My code generates them with the hairy regular expression. Your code generates all possible continuations and filter only the correct ones trough hash lookup.
      3. less is more

      Best regards

      Utterly, utterly amazing.

      As presented, I think you would have to acquire a few maintainability demerits, but I don't think that it would be that hard to clean it up, and I don't think the result would hurt your timing or LOC unduly.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Remember when considering "maintainability" that the shorter the program, the easier it is to understand :)

        /me ducks
      Slightly faster :)
      #!perl use strict; my %dict; @ARGV = qw( d.dict d.input ); while(<>) # read in both files, load dict on first, match on second { chomp; (my $d = lc) =~ tr|a-z"/-|57630499617851881234762239|d; @ARGV ? push @{$dict{$d}}, $_ : match($d, 0, "$_:"); } sub match # (to match, last was digit, matches) { my ($in, $digit, @have, $k) = @_; map match(substr($in, $k), !++$digit, @have, $_), @{$dict{substr $in, 0, $k = $_}} for 2..length $in; $in =~ s/(.)// ? $digit || match($in, 1, @have, $1) : print "@have\n +"; }
Re: One for the weekend: challenge
by dragonchild (Archbishop) on May 31, 2008 at 01:01 UTC
    The problem is not only suited to Lisp because of the reason Anonymonk gave, but also because the most elegant solution to this problem is one that requires more exposure to CS than most Java programmers (in my experience) tend to have. I just coded up something similar in JS for a Go-playing program I'm playing with (thanks to tye's NextPermute() implementation in Algorithm::Loops). I have no desire to actually write it, so I'll give my plan in readmore tags below.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      All I can say in response is a) I would love to see your approach coded; b) you will not appreciate the complexity of the problem until you have attempted to code a solution.

      The devil, as always, is in the detail. Performing the mappings is relatively simple, but avoiding the production of the cases that are explicitly excluded: Particularly

      1078-913-5: je Bo" 9 1 da , since there are two subsequent digits in the encoding

      Throws most initial and theoretical solutions into a cocked hat.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: One for the weekend: challenge
by karavelov (Monk) on Jun 01, 2008 at 05:29 UTC

    I have solved it but I have passed 3 stages in order to make it run fast. The code has been around 60 lines in all of the stages. I have decided to play on the strong sides of perl and I have used regular expressions.

    So my first version took me one and a half hour to write and debug the logic. The code was rewriting regexes and it was pathetically slow. It was running on the set for 5 minutes on my home PC (Pentium 4).

    Next version took me another one and an half hour to convert it to use pre-compiled regular expressions. I have used some black magic in order to make exhaustive pattern matching. The interesting line is this :

     qr{^($re1)(?{ push @match, [$-[0], $+[0]] })(*FAIL)};

    where $re1 is gigantic alternative (123|2322|...). It was running on the set for 32 seconds - not bad but I was expecting more.

    I have read that there is "trie optimization" of the regexes in last version of perl. It take me another 2 hours to make optimizations and to debug why the hell I am not getting trie optimized regexes for the gigantic alternative I was constructing from the dictionary. Experimentally I have found that there is a limit between 11000 and 13000 alternatives so I decided to partition the constructed regex in 10 smaller ones. And this one worked gaining much more speed. It was running on the set of numbers for 2.2 seconds. Now I an happy.

    In summary I was developing the code for 5 hours. It consists of 65 lines of code that follows. It runs on my PC (Pentium 4 @ 3GHz) for 2.2 seconds and 1.6 sec on my office PC (Intel Dual Core E2160 @ 1.80GHz) and eats around 22M of ram

    I have learned a little bit more about regular expressions, their optimization and extension.

        Thanks. Yes I have figured out how to workaround this limitation. It will be great if it is fixed some day.
Re: One for the weekend: challenge
by Anonymous Monk on May 31, 2008 at 00:31 UTC

    The deck is pretty-well “stacked” in favor of LISP on this one:   the nature of the problem lends itself particularly well to recursive solution and to so-called “pragmatic programming” (a LISP specialty in which programs build lists and then execute those lists).

    I generally don't think that it's too useful to compare “language to language” without first giving very careful consideration to what it is that you are trying to do. Languages are tools... nothing more. You need to evaluate a great many non-computer factors when deciding what language(s) to deploy against a particular business task.

      You might be surprised! But, I didn't post it for that reason.

      It's purely an exercise in seeing how people approach the problem in Perl, and then attempting to come up with some measure of "goodness" for those approaches.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: One for the weekend: challenge
by Limbic~Region (Chancellor) on Jun 01, 2008 at 02:56 UTC
    BrowserUk,
    The following code should produce the correct output. I have tested it individually on a handful of the "expected output" numbers. I am going to let it run overnight to see how long it takes. Using perl -MO=Deparse solution.pl | wc -l it is 44 lines of code.

    In a nutshell, it generates the unrestricted integer partitions of the number. It skips over any partitioning that has so many 1s it could not satisfy the problem constraints. It then generates all permutations of each possible partition - skipping over partitioning with two adjacent 1s. It then checks if it produces a valid solution. I am sure it could be optimized a lot, but that would add code complexity and lines of code.

    Update: Shortly after posting, I realized by re-arranging some logic it would be more efficient. The node has been updated to reflect that.

    Cheers - L~R

      It loops forever on long numbers. And sometimes produces false results. I have not analyzed why exactly this is wrong but for the number -810873502888/74-556227/1 (the last number of the output.txt) it gives 62 results while the number of correct results is 18.
        karavelov,
        Thanks. I am not going to invest too much time in it as I believe the logic to be sound - the implementation is the problem.

        Cheers - L~R

Re: One for the weekend: challenge
by kyle (Abbot) on Jun 02, 2008 at 06:20 UTC

    Here's what I came up with:

    The solution itself is absolutely correct. The I/O loop at the beginning might be broken, though. I didn't test that part. Rather, I tested the module with Test::More and wrote the I/O loop as an afterthought. Here are my tests:

    I started with the short disambiguating sample material and then moved on to the main input. I found that '2' as an input appears twice, so I had to "fix" my testing code to account for that (I hardcoded the solution for '2'). I found also that I had one last bug to fix for cases that produce words but end in the digit 0 (that's 7 of the 1000 inputs).

    I did this without reading any of the other solutions. I wasn't going to work on this at all until I saw the one from Anonymous Monk was so short. Seeing that, I figured it couldn't be quite as time-consuming as I was thinking.

    I wasn't really trying to make it short or especially efficient—just correct. Before I started I had this idea that I would profile it when I was done and make it really fast, but now I'd rather go to bed.

    Update: I've fixed the problem that BrowserUK pointed out, and I made a performance tweak based on profiling. I was surprised to discover that more time was spent in new() than in all 7000 calls to words_to_phone() put together. I fixed that by changing "s/(.)/$num_for{ $1 }/g" to "tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999/;" thereby sacrificing maintainability for speed. It's a big speed difference, however. Run time went from over 1 second to about a half a second.

      I had to invert the (physical source-code) positioning of the package and main code in order that %num_for would be set up prior to the constructor being called. Alternatively, I could separate the package into a .pm file.

      With either correction, this produces the correct output quite efficiently.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      I've had some time to ponder my own work here, and I'm writing here to share my thoughts.

      Let me start on the defensive: I wrote this tired and trying to finish in minimal time. I was shooting for functionality more than elegance or brevity.

      Probably the biggest wart is using caller as a sort of flag parameter to the main words_for_phone. I made a words_only, and it does nothing but call words_for_phone. There, I have a check to see if that's where the call came from:

      # If this was called from 'words_only', don't produce an individua +l digit my $words_only; { no warnings 'uninitialized'; my @call = caller 1; $words_only = ( $call[3] eq __PACKAGE__ . '::words_only' ); }

      What I should have done instead is change the interface. The new convention would be that words_for_phone is called with an array reference and an optional flag second argument. It's not called many places, so this wouldn't be a difficult change.

      If I'd made "my $out = []" instead be "my @out", I'd have saved myself some "@{ }" in a few places. Really the only place $out appears by itself is at the return, so that could be "return \@out". On the other hand, making it an array reference from the beginning indicates clearly that we're returning an array reference.

      I think mine's the only solution that uses a package. I made it modular so I could test it more easily, though this certainly cost some lines of code to support it. I don't regret this decision at all. It was great to be able to just run the tests and see if I'd fixed what I was working on (and if I'd broken anything that worked before).

      This was written incrementally, and it still has that feel to me. I wrote something to handle the easiest cases first (where the phone can be entirely translated to words) and then built stuff up around it to handle more difficult cases.

      If this were production code and not a toy problem, it would benefit greatly from sanity checks in several places. Dictionary words should contain no spaces or digits. Don't call the one instance method as a class method and vice versa.

      I welcome comments from any other monks who might have read what I wrote.

Re: One for the weekend: challenge
by ysth (Canon) on Jun 01, 2008 at 06:01 UTC
    Pfah! Cannot they write problems like this without ambiguities? We hates them, we do.

    The one that's bothering me is the "can" in: "If and only if at a particular point no word at all from the dictionary can be inserted,". Does that mean "can be inserted to successfully produce a solution"? Or "can be inserted whether or not a correct solution results"? If the former, given the single-word dictionary "aaa", "55555" has solution "5 aaa 5". If the latter, it has no solutions.

      I would suggest that all ambiguities are resolved by example. In this case, the following would seem to apply:

      4824: 4 Ort, because in place of the first digit the words Torf, fort, Tor could be used

      I read that to say, that as 'aaa' can be applied at the first step, inserting a digit at that point is illegal. Therefore your example has no solutions.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Nope, that doesn't do it, because using Torf ,fort, or Tor 4 produces a solution. My doubt is whether a digit can be inserted where there could be a word, but not one that produces a solution.

        I haven't rigorously examined the sample solutions for lack of one that would establish the point, nor do I really care to.

        BrowserUk,
        I also read the rules incorrectly because of the following example:
        04824: 0 Torf 04824: 0 fort 04824: 0 Tor 4
        The last solution is what lead me astray. If it had read:

        As you work left to right, if at any point a word can be inserted - using a digit is illegal and at no point can two digits be adjacent

        I would have understood much better. In any event, my solution is buggy and produces incorrect results. Since it seems like you were after approaches, I believe mine is interesting because of the unrestricted integer partitions. I have no interest at this point in making it a correct solution.

        Cheers - L~R

      We are looking only for correct results, aren't we? So, I think that only the first interpretation is correct. In your example, the "5 aaa 5" case.
Re: One for the weekend: challenge
by Limbic~Region (Chancellor) on Jun 03, 2008 at 03:17 UTC
    BrowserUk,
    Now that I understand the problem, here is a correct working solution that is relatively fast (8 seconds locally). Running it through perl -MO=Deparse solution.pl | wc -l indicates it is 42 lines. It isn't very clean as I didn't have very much time to implement. If I have time, I will see about clarity and optimizations. If nothing else, I will add comments tomorrow so at least the algorithm is clear even if the code is not.

    Cheers - L~R

        So which solution matched up to your expectations the best? I haven't tried all of them yet... waiting for my next break.
        BrowserUk,
        Here is the same code re-written to be more clear and maintainable. I still think the handful of special case scenarios could be reduced to a more general case. Unfortunately, I don't have time to try.

        Cheers - L~R

Re: One for the weekend: (provisional results)
by BrowserUk (Patriarch) on Jun 04, 2008 at 14:45 UTC

    I'm not antisipating any late entries, so here are provisional findings on the challenge. Final scoring will require me to write up some kind of assessment of the entries and may take a few days, if it gets done at all. (Anyone really interested?)

    The program that generates these will be posted below. As will my original attempt (labelled:BrowserUk below). (Update: Corrected score calculations!)

    c:\test\685390>685390.plt Testing Anonymonk1-2 Testing Anonymonk1-reworked Testing Anonymonk1 Testing BrowserUk Testing BrowserUk2 Testing BrowserUk3 Testing karavelov Testing kyle Testing kyle2 Testing LimbicRegion2 Testing LimbicRegion3 { Anonymonk1 =>{ ELAPSED=> 1.64, LOC=> 15, OUTCOUNT=> 262, SCORE=> 2 +4.6 }, Anonymonk1-2 =>{ ELAPSED=> 1.48, LOC=> 12, OUTCOUNT=> 262, SCORE=> 1 +7.76 }, Anonymonk1-rw =>{ ELAPSED=> 1.93, LOC=> 27, OUTCOUNT=> 262, SCORE=> 5 +2.11 }, BrowserUk =>{ ELAPSED=>10.57, LOC=> 36, OUTCOUNT=> 262, SCORE=> 38 +0.52 }, BrowserUk2 =>{ ELAPSED=> 2.07, LOC=> 28, OUTCOUNT=> 262, SCORE=> 5 +7.96 }, BrowserUk3 =>{ ELAPSED=> 1.67, LOC=> 49, OUTCOUNT=> 89, SCORE > 8 +1.83 }, LimbicRegion2 =>{ ELAPSED=>11.03, LOC=> 40, OUTCOUNT=> 262, SCORE=> 44 +1.2 }, LimbicRegion3 =>{ ELAPSED=>12.04, LOC=> 63, OUTCOUNT=> 262, SCORE=> 75 +8.52 }, karavelov =>{ ELAPSED=> 5.08, LOC=> 63, OUTCOUNT=> 262, SCORE=> 32 +0.04 }, kyle =>{ ELAPSED=> 4.69, LOC=> 79, OUTCOUNT=> 262, SCORE=> 37 +0.51 }, kyle2 =>{ ELAPSED=> 2.98, LOC=> 73, OUTCOUNT=> 262, SCORE=> 21 +7.54 }, }

    (*)manually compacted to comply with the pathetic wrap ilmit. Anybody would think this was the 1970s!

    Ignore, BrowserUk2, BrowserUk3 and Anonymonk1-reworked.

    • BrowserUk2 is my attempt to clean up my own entry once I saw the performance of other peoples solutions.

      It must be discounted because it was too influenced by the work of others.

    • BrowserUk3 is my second attempt at clean up avoiding the influence of others.

      A work in progress and currently non-complient.

    • Anonymonk1-reworked is my attempt to sanitize anonymonk's first entry for 'maintanence purposes'.

      This may get posted as part of a further post discussing that subject.

    • The others above are as posted in this thread.

      Kyle is kyle's entry prior to his updates. kyle2 post-updates.

    Test program:

    My pre-challenge solution, warts an all. It took less than 1 hour to program a recursive solution that found all possible encodings. Another ~2 hours to hack it to fulfill the specs arbitrary limitations:

    1. No adjacent digits.
    2. No digit where a word is available (even if the use of that word does not result in a successful encoding).

    I was quite pleased with the LOC & timings, relative to the information available for the Lisp and Java entries.(Until I saw anonymonks entry! :). I was aware it was convoluted and would need refactoring. Hence the attempts above.

    #! perl -slw use strict; use List::Util qw[ reduce ]; $a = $b; use constant DICTIONARY => 'dictionary.txt'; my %mappings; reduce{ $mappings{ $_ } = $a for split '', $b; $a + 1; } 0, qw[ e jnq rwx dsy ft am civ bku lop ghz ]; my %dict = map{ $_ => [ $_ ] } 0 .. 9; keys %dict = 300000; open DICT, '<', DICTIONARY or die "${\DICTIONARY}: $!"; while( <DICT> ) { chomp; my $orig = $_; tr[A-Z"/-][a-z]d; #" push @{ $dict{ join '', @mappings{ split '', $_ } } }, $orig; } close DICT; sub encode { my $number = shift; my $start = length( $number ); --$start while $start and not exists $dict{ substr $number, 0, $st +art }; $start = 2 if $start > 1; return @{ $dict{ $number } || [] }, grep !/\d \d/, map { my( $pre, $post ) = unpack "a$_ a*", $number; map{ my $head = $_; map "$head $_", encode( $post ); } @{ $dict{ $pre } || [] }, } $start .. length( $number )-1; } while( <> ) { chomp; my $orig = $_; tr[/-][]d; print "$orig: $_" for encode( $_ ); }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://689350]
Approved by friedo
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (7)
As of 2024-03-19 10:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found