Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Here's what I came up with:

use strict; use warnings; my @dictionary = slurp( 'dictionary.txt' ); my $translator = Node689350->new( { dict => \@dictionary } ); while (<>) { chomp; my $out = $translator->words_for_phone( $_ ); foreach my $o ( @{ $out } ) { print "$_: $o\n"; } } sub slurp { my $file = shift; open my $fh, '<', $file or die "Can't read '$file': $!"; chomp( my @lines = <$fh> ); close $fh; return @lines; } package Node689350; # Call as __PACKAGE__->new( { dict => [ ... ] } ) # where "..." is every word in the dictionary sub new { my $class = shift; my $self = shift; foreach my $w ( @{ $self->{dict} } ) { my $orig = $w; # save the original word (for output) # turn to all lowercase letters $w =~ tr/A-Z/a-z/; $w =~ tr/a-z//cd; # translate to the number that makes this word $w =~ tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999 +/; # Store this word as one possible result of this number push @{ $self->{word_for}{$w} }, $orig; } # I don't need the dictionary anymore delete $self->{dict}; return bless $self, $class; } # This function will NOT produce individual digits as "words" sub words_only { my $self = shift; return $self->words_for_phone( @_ ); } sub words_for_phone { my $self = shift; my $out = []; # If this was called from 'words_only', don't produce an individua +l digit my $words_only; { no warnings 'uninitialized'; my @call = caller 1; $words_only = ( $call[3] eq __PACKAGE__ . '::words_only' ); } while ( defined( my $phone = shift ) ) { # Turn the input phone number to digits only $phone =~ tr/0-9//cd; # Every truncation, longest first my @subphones = reverse map { substr $phone, 0, $_ } 1 .. length $phone; # Every truncation that has an associated word my @word_subphones = grep { exists $self->{word_for}{$_} } @subphones; # If there were no words found, # and we don't have to stick to words only, if ( ! @word_subphones && ! $words_only ) { # Get the first digit as a "word" my ( $w, $new_phone ) = ( $phone =~ m{ \A (.) (.*) \z }xms + ); # If this is the last digit, return this as the only produ +ction if ( ! length $new_phone ) { return [ $w ]; } # Make the rest of the number into words (NOT DIGITS) foreach my $more ( @{ $self->words_only( $new_phone ) } ) +{ push @{ $out }, $w . ' ' . $more; } } SUBPHONE: foreach my $n ( @word_subphones ) { # This is the part of the number we didn't translate my $new_phone = substr $phone, length $n; # If there's no more left to translate, don't try if ( ! length $new_phone ) { push @{ $out }, @{ $self->{word_for}{$n} }; next SUBPHONE; } # for every translation we have, foreach my $w ( @{ $self->{word_for}{$n} } ) { # look for translations of the remaining number foreach my $more ( @{ $self->words_for_phone( $new_pho +ne ) } ) { push @{ $out }, $w . ' ' . $more; } } } } return $out; }

The solution itself is absolutely correct. The I/O loop at the beginning might be broken, though. I didn't test that part. Rather, I tested the module with Test::More and wrote the I/O loop as an afterthought. Here are my tests:

package main; use Test::More; my $sample_dict = <<'END_SAMPLE_DICT'; an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser END_SAMPLE_DICT ; my @sample_phones = grep { length } split /\s+/, q{ 112 5624-82 4824 0721/608-4067 10/783--5 1078-913-5 381482 04824 }; my $sample_output = <<'END_SAMPLE_OUTPUT'; 5624-82: mir Tor 5624-82: Mix Tor 4824: Torf 4824: fort 4824: Tor 4 10/783--5: neu o"d 5 10/783--5: je bo"s 5 10/783--5: je Bo" da 381482: so 1 Tor 04824: 0 Torf 04824: 0 fort 04824: 0 Tor 4 END_SAMPLE_OUTPUT ; if (0) { my %output_for = map { $_ => [] } @sample_phones; foreach my $output_line ( split /\n/, $sample_output ) { my ( $phone, $word ) = split /: /, $output_line; push @{ $output_for{ $phone } }, $word; } my $x = Node689350->new( { dict => [ split /\n/, $sample_dict ] } +); plan 'tests' => scalar @sample_phones; foreach my $phone ( @sample_phones ) { my $phone_words = $x->words_for_phone( $phone ); is_deeply( [ sort @{ $phone_words } ], [ sort @{ $output_for{ $phone } } ], "phone $phone" ); } } else { my @dictionary = slurp( 'dictionary.txt' ); my @input = slurp( 'input.txt' ); my @output = slurp( 'output.txt' ); my %output_for = map { $_ => [] } @input; foreach my $output_line ( @output ) { my ( $phone, $word ) = split /: /, $output_line; push @{ $output_for{ $phone } }, $word; } # This appears twice for some reason $output_for{ '2' } = [ '2' ]; plan 'tests' => scalar @input; my $x = Node689350->new( { dict => \@dictionary } ); foreach my $phone ( @input ) { my $phone_words = $x->words_for_phone( $phone ); is_deeply( [ sort @{ $phone_words } ], [ sort @{ $output_for{ $phone } } ], "phone $phone" ); } }

I started with the short disambiguating sample material and then moved on to the main input. I found that '2' as an input appears twice, so I had to "fix" my testing code to account for that (I hardcoded the solution for '2'). I found also that I had one last bug to fix for cases that produce words but end in the digit 0 (that's 7 of the 1000 inputs).

I did this without reading any of the other solutions. I wasn't going to work on this at all until I saw the one from Anonymous Monk was so short. Seeing that, I figured it couldn't be quite as time-consuming as I was thinking.

I wasn't really trying to make it short or especially efficient—just correct. Before I started I had this idea that I would profile it when I was done and make it really fast, but now I'd rather go to bed.

Update: I've fixed the problem that BrowserUK pointed out, and I made a performance tweak based on profiling. I was surprised to discover that more time was spent in new() than in all 7000 calls to words_to_phone() put together. I fixed that by changing "s/(.)/$num_for{ $1 }/g" to "tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999/;" thereby sacrificing maintainability for speed. It's a big speed difference, however. Run time went from over 1 second to about a half a second.


In reply to Re: One for the weekend: challenge by kyle
in thread One for the weekend: challenge by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (6)
    As of 2014-12-20 12:29 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (95 votes), past polls