Re: One for the weekend: challenge

in reply to One for the weekend: challenge

Here's what I came up with:

use strict;
use warnings;

my @dictionary = slurp( 'dictionary.txt' );
my $translator = Node689350->new( { dict => \@dictionary } );

while (<>) {
    chomp;
    my $out = $translator->words_for_phone( $_ );
    foreach my $o ( @{ $out } ) {
        print "$_: $o\n";
    }
}

sub slurp {
    my $file = shift;

    open my $fh, '<', $file
        or die "Can't read '$file': $!";
    chomp( my @lines = <$fh> );
    close $fh;

    return @lines;
}

package Node689350;

# Call as __PACKAGE__->new( { dict => [ ... ] } )
# where "..." is every word in the dictionary
sub new {
    my $class = shift;
    my $self = shift;

    foreach my $w ( @{ $self->{dict} } ) {
        my $orig = $w;        # save the original word (for output)

        # turn to all lowercase letters
        $w =~ tr/A-Z/a-z/;
        $w =~ tr/a-z//cd;

        # translate to the number that makes this word
        $w =~ tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999
+/;

        # Store this word as one possible result of this number
        push @{ $self->{word_for}{$w} }, $orig;
    }

    # I don't need the dictionary anymore
    delete $self->{dict};

    return bless $self, $class;
}

# This function will NOT produce individual digits as "words"
sub words_only {
    my $self = shift;
    return $self->words_for_phone( @_ );
}

sub words_for_phone {
    my $self = shift;

    my $out = [];

    # If this was called from 'words_only', don't produce an individua
+l digit
    my $words_only;
    {
        no warnings 'uninitialized';
        my @call = caller 1;
        $words_only = ( $call[3] eq __PACKAGE__ . '::words_only' );
    }

    while ( defined( my $phone = shift ) ) {
        # Turn the input phone number to digits only
        $phone =~ tr/0-9//cd;

        # Every truncation, longest first
        my @subphones
            = reverse map { substr $phone, 0, $_ } 1 .. length $phone;

        # Every truncation that has an associated word
        my @word_subphones
            = grep { exists $self->{word_for}{$_} } @subphones;

        # If there were no words found,
        # and we don't have to stick to words only,
        if ( ! @word_subphones && ! $words_only ) {
            # Get the first digit as a "word"
            my ( $w, $new_phone ) = ( $phone =~ m{ \A (.) (.*) \z }xms
+ );

            # If this is the last digit, return this as the only produ
+ction
            if ( ! length $new_phone ) {
                return [ $w ];
            }

            # Make the rest of the number into words (NOT DIGITS)
            foreach my $more ( @{ $self->words_only( $new_phone ) } ) 
+{
                push @{ $out }, $w . ' ' . $more;
            }
        }
      SUBPHONE:
        foreach my $n ( @word_subphones ) {
            # This is the part of the number we didn't translate
            my $new_phone = substr $phone, length $n;

            # If there's no more left to translate, don't try
            if ( ! length $new_phone ) {
                push @{ $out }, @{ $self->{word_for}{$n} };
                next SUBPHONE;
            }

            # for every translation we have,
            foreach my $w ( @{ $self->{word_for}{$n} } ) {

                # look for translations of the remaining number
                foreach my $more ( @{ $self->words_for_phone( $new_pho
+ne ) } ) {
                    push @{ $out }, $w . ' ' . $more;
                }
            }
        }
    }

    return $out;
}
[download]

The solution itself is absolutely correct. The I/O loop at the beginning might be broken, though. I didn't test that part. Rather, I tested the module with Test::More and wrote the I/O loop as an afterthought. Here are my tests:

package main;

use Test::More;

my $sample_dict = <<'END_SAMPLE_DICT';
an
blau
Bo"
Boot
bo"s
da
Fee
fern
Fest
fort
je
jemand
mir
Mix
Mixer
Name
neu
o"d
Ort
so
Tor
Torf
Wasser
END_SAMPLE_DICT
    ;

my @sample_phones = grep { length } split /\s+/, q{
112
5624-82
4824
0721/608-4067
10/783--5
1078-913-5
381482
04824
};

my $sample_output = <<'END_SAMPLE_OUTPUT';
5624-82: mir Tor
5624-82: Mix Tor
4824: Torf
4824: fort
4824: Tor 4
10/783--5: neu o"d 5
10/783--5: je bo"s 5
10/783--5: je Bo" da
381482: so 1 Tor
04824: 0 Torf
04824: 0 fort
04824: 0 Tor 4
END_SAMPLE_OUTPUT
    ;

if (0) {
    my %output_for = map { $_ => [] } @sample_phones;
    foreach my $output_line ( split /\n/, $sample_output ) {
        my ( $phone, $word ) = split /: /, $output_line;
        push @{ $output_for{ $phone } }, $word;
    }

    my $x = Node689350->new( { dict => [ split /\n/, $sample_dict ] } 
+);

    plan 'tests' => scalar @sample_phones;

    foreach my $phone ( @sample_phones ) {
        my $phone_words = $x->words_for_phone( $phone );
        is_deeply( [ sort @{ $phone_words } ],
                   [ sort @{ $output_for{ $phone } } ],
                   "phone $phone" );
    }
}
else {
    my @dictionary = slurp( 'dictionary.txt' );
    my @input      = slurp( 'input.txt' );
    my @output     = slurp( 'output.txt' );

    my %output_for = map { $_ => [] } @input;
    foreach my $output_line ( @output ) {
        my ( $phone, $word ) = split /: /, $output_line;
        push @{ $output_for{ $phone } }, $word;
    }

    # This appears twice for some reason
    $output_for{ '2' } = [ '2' ];
    
    plan 'tests' => scalar @input;

    my $x = Node689350->new( { dict => \@dictionary } );

    foreach my $phone ( @input ) {
        my $phone_words = $x->words_for_phone( $phone );
        is_deeply( [ sort @{ $phone_words } ],
                   [ sort @{ $output_for{ $phone } } ],
                   "phone $phone" );
    }
}
[download]

I started with the short disambiguating sample material and then moved on to the main input. I found that '2' as an input appears twice, so I had to "fix" my testing code to account for that (I hardcoded the solution for '2'). I found also that I had one last bug to fix for cases that produce words but end in the digit 0 (that's 7 of the 1000 inputs).

I did this without reading any of the other solutions. I wasn't going to work on this at all until I saw the one from Anonymous Monk was so short. Seeing that, I figured it couldn't be quite as time-consuming as I was thinking.

I wasn't really trying to make it short or especially efficient—just correct. Before I started I had this idea that I would profile it when I was done and make it really fast, but now I'd rather go to bed.

Update: I've fixed the problem that BrowserUK pointed out, and I made a performance tweak based on profiling. I was surprised to discover that more time was spent in new() than in all 7000 calls to words_to_phone() put together. I fixed that by changing "s/(.)/$num_for{ $1 }/g" to "tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999/;" thereby sacrificing maintainability for speed. It's a big speed difference, however. Run time went from over 1 second to about a half a second.

Comment on Re: One for the weekend: challenge Select or Download Code

Replies are listed 'Best First'.
Re^2: One for the weekend: challenge by BrowserUk (Patriarch) on Jun 02, 2008 at 10:49 UTC
I had to invert the (physical source-code) positioning of the package and main code in order that `%num_for` would be set up prior to the constructor being called. Alternatively, I could separate the package into a .pm file. With either correction, this produces the correct output quite efficiently. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: One for the weekend: challenge (self-review) by kyle (Abbot) on Jun 10, 2008 at 21:51 UTC
I've had some time to ponder my own work here, and I'm writing here to share my thoughts. Let me start on the defensive: I wrote this tired and trying to finish in minimal time. I was shooting for functionality more than elegance or brevity. Probably the biggest wart is using caller as a sort of flag parameter to the main `words_for_phone`. I made a `words_only`, and it does nothing but call `words_for_phone`. There, I have a check to see if that's where the call came from: `# If this was called from 'words_only', don't produce an individua +l digit my $words_only; { no warnings 'uninitialized'; my @call = caller 1; $words_only = ( $call[3] eq __PACKAGE__ . '::words_only' ); }` [download] What I should have done instead is change the interface. The new convention would be that `words_for_phone` is called with an array reference and an optional flag second argument. It's not called many places, so this wouldn't be a difficult change. If I'd made "`my $out = []`" instead be "`my @out`", I'd have saved myself some "`@{ }`" in a few places. Really the only place `$out` appears by itself is at the return, so that could be "`return \@out`". On the other hand, making it an array reference from the beginning indicates clearly that we're returning an array reference. I think mine's the only solution that uses a package. I made it modular so I could test it more easily, though this certainly cost some lines of code to support it. I don't regret this decision at all. It was great to be able to just run the tests and see if I'd fixed what I was working on (and if I'd broken anything that worked before). This was written incrementally, and it still has that feel to me. I wrote something to handle the easiest cases first (where the phone can be entirely translated to words) and then built stuff up around it to handle more difficult cases. If this were production code and not a toy problem, it would benefit greatly from sanity checks in several places. Dictionary words should contain no spaces or digits. Don't call the one instance method as a class method and vice versa. I welcome comments from any other monks who might have read what I wrote.	[reply] [d/l] [select]

In Section Meditations