http://www.perlmonks.org?node_id=689640


in reply to One for the weekend: challenge

Here's what I came up with:

use strict; use warnings; my @dictionary = slurp( 'dictionary.txt' ); my $translator = Node689350->new( { dict => \@dictionary } ); while (<>) { chomp; my $out = $translator->words_for_phone( $_ ); foreach my $o ( @{ $out } ) { print "$_: $o\n"; } } sub slurp { my $file = shift; open my $fh, '<', $file or die "Can't read '$file': $!"; chomp( my @lines = <$fh> ); close $fh; return @lines; } package Node689350; # Call as __PACKAGE__->new( { dict => [ ... ] } ) # where "..." is every word in the dictionary sub new { my $class = shift; my $self = shift; foreach my $w ( @{ $self->{dict} } ) { my $orig = $w; # save the original word (for output) # turn to all lowercase letters $w =~ tr/A-Z/a-z/; $w =~ tr/a-z//cd; # translate to the number that makes this word $w =~ tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999 +/; # Store this word as one possible result of this number push @{ $self->{word_for}{$w} }, $orig; } # I don't need the dictionary anymore delete $self->{dict}; return bless $self, $class; } # This function will NOT produce individual digits as "words" sub words_only { my $self = shift; return $self->words_for_phone( @_ ); } sub words_for_phone { my $self = shift; my $out = []; # If this was called from 'words_only', don't produce an individua +l digit my $words_only; { no warnings 'uninitialized'; my @call = caller 1; $words_only = ( $call[3] eq __PACKAGE__ . '::words_only' ); } while ( defined( my $phone = shift ) ) { # Turn the input phone number to digits only $phone =~ tr/0-9//cd; # Every truncation, longest first my @subphones = reverse map { substr $phone, 0, $_ } 1 .. length $phone; # Every truncation that has an associated word my @word_subphones = grep { exists $self->{word_for}{$_} } @subphones; # If there were no words found, # and we don't have to stick to words only, if ( ! @word_subphones && ! $words_only ) { # Get the first digit as a "word" my ( $w, $new_phone ) = ( $phone =~ m{ \A (.) (.*) \z }xms + ); # If this is the last digit, return this as the only produ +ction if ( ! length $new_phone ) { return [ $w ]; } # Make the rest of the number into words (NOT DIGITS) foreach my $more ( @{ $self->words_only( $new_phone ) } ) +{ push @{ $out }, $w . ' ' . $more; } } SUBPHONE: foreach my $n ( @word_subphones ) { # This is the part of the number we didn't translate my $new_phone = substr $phone, length $n; # If there's no more left to translate, don't try if ( ! length $new_phone ) { push @{ $out }, @{ $self->{word_for}{$n} }; next SUBPHONE; } # for every translation we have, foreach my $w ( @{ $self->{word_for}{$n} } ) { # look for translations of the remaining number foreach my $more ( @{ $self->words_for_phone( $new_pho +ne ) } ) { push @{ $out }, $w . ' ' . $more; } } } } return $out; }

The solution itself is absolutely correct. The I/O loop at the beginning might be broken, though. I didn't test that part. Rather, I tested the module with Test::More and wrote the I/O loop as an afterthought. Here are my tests:

package main; use Test::More; my $sample_dict = <<'END_SAMPLE_DICT'; an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser END_SAMPLE_DICT ; my @sample_phones = grep { length } split /\s+/, q{ 112 5624-82 4824 0721/608-4067 10/783--5 1078-913-5 381482 04824 }; my $sample_output = <<'END_SAMPLE_OUTPUT'; 5624-82: mir Tor 5624-82: Mix Tor 4824: Torf 4824: fort 4824: Tor 4 10/783--5: neu o"d 5 10/783--5: je bo"s 5 10/783--5: je Bo" da 381482: so 1 Tor 04824: 0 Torf 04824: 0 fort 04824: 0 Tor 4 END_SAMPLE_OUTPUT ; if (0) { my %output_for = map { $_ => [] } @sample_phones; foreach my $output_line ( split /\n/, $sample_output ) { my ( $phone, $word ) = split /: /, $output_line; push @{ $output_for{ $phone } }, $word; } my $x = Node689350->new( { dict => [ split /\n/, $sample_dict ] } +); plan 'tests' => scalar @sample_phones; foreach my $phone ( @sample_phones ) { my $phone_words = $x->words_for_phone( $phone ); is_deeply( [ sort @{ $phone_words } ], [ sort @{ $output_for{ $phone } } ], "phone $phone" ); } } else { my @dictionary = slurp( 'dictionary.txt' ); my @input = slurp( 'input.txt' ); my @output = slurp( 'output.txt' ); my %output_for = map { $_ => [] } @input; foreach my $output_line ( @output ) { my ( $phone, $word ) = split /: /, $output_line; push @{ $output_for{ $phone } }, $word; } # This appears twice for some reason $output_for{ '2' } = [ '2' ]; plan 'tests' => scalar @input; my $x = Node689350->new( { dict => \@dictionary } ); foreach my $phone ( @input ) { my $phone_words = $x->words_for_phone( $phone ); is_deeply( [ sort @{ $phone_words } ], [ sort @{ $output_for{ $phone } } ], "phone $phone" ); } }

I started with the short disambiguating sample material and then moved on to the main input. I found that '2' as an input appears twice, so I had to "fix" my testing code to account for that (I hardcoded the solution for '2'). I found also that I had one last bug to fix for cases that produce words but end in the digit 0 (that's 7 of the 1000 inputs).

I did this without reading any of the other solutions. I wasn't going to work on this at all until I saw the one from Anonymous Monk was so short. Seeing that, I figured it couldn't be quite as time-consuming as I was thinking.

I wasn't really trying to make it short or especially efficient—just correct. Before I started I had this idea that I would profile it when I was done and make it really fast, but now I'd rather go to bed.

Update: I've fixed the problem that BrowserUK pointed out, and I made a performance tweak based on profiling. I was surprised to discover that more time was spent in new() than in all 7000 calls to words_to_phone() put together. I fixed that by changing "s/(.)/$num_for{ $1 }/g" to "tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999/;" thereby sacrificing maintainability for speed. It's a big speed difference, however. Run time went from over 1 second to about a half a second.

Replies are listed 'Best First'.
Re^2: One for the weekend: challenge
by BrowserUk (Patriarch) on Jun 02, 2008 at 10:49 UTC

    I had to invert the (physical source-code) positioning of the package and main code in order that %num_for would be set up prior to the constructor being called. Alternatively, I could separate the package into a .pm file.

    With either correction, this produces the correct output quite efficiently.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: One for the weekend: challenge (self-review)
by kyle (Abbot) on Jun 10, 2008 at 21:51 UTC

    I've had some time to ponder my own work here, and I'm writing here to share my thoughts.

    Let me start on the defensive: I wrote this tired and trying to finish in minimal time. I was shooting for functionality more than elegance or brevity.

    Probably the biggest wart is using caller as a sort of flag parameter to the main words_for_phone. I made a words_only, and it does nothing but call words_for_phone. There, I have a check to see if that's where the call came from:

    # If this was called from 'words_only', don't produce an individua +l digit my $words_only; { no warnings 'uninitialized'; my @call = caller 1; $words_only = ( $call[3] eq __PACKAGE__ . '::words_only' ); }

    What I should have done instead is change the interface. The new convention would be that words_for_phone is called with an array reference and an optional flag second argument. It's not called many places, so this wouldn't be a difficult change.

    If I'd made "my $out = []" instead be "my @out", I'd have saved myself some "@{ }" in a few places. Really the only place $out appears by itself is at the return, so that could be "return \@out". On the other hand, making it an array reference from the beginning indicates clearly that we're returning an array reference.

    I think mine's the only solution that uses a package. I made it modular so I could test it more easily, though this certainly cost some lines of code to support it. I don't regret this decision at all. It was great to be able to just run the tests and see if I'd fixed what I was working on (and if I'd broken anything that worked before).

    This was written incrementally, and it still has that feel to me. I wrote something to handle the easiest cases first (where the phone can be entirely translated to words) and then built stuff up around it to handle more difficult cases.

    If this were production code and not a toy problem, it would benefit greatly from sanity checks in several places. Dictionary words should contain no spaces or digits. Don't call the one instance method as a class method and vice versa.

    I welcome comments from any other monks who might have read what I wrote.