Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Here's what I came up with:

use strict; use warnings; my @dictionary = slurp( 'dictionary.txt' ); my $translator = Node689350->new( { dict => \@dictionary } ); while (<>) { chomp; my $out = $translator->words_for_phone( $_ ); foreach my $o ( @{ $out } ) { print "$_: $o\n"; } } sub slurp { my $file = shift; open my $fh, '<', $file or die "Can't read '$file': $!"; chomp( my @lines = <$fh> ); close $fh; return @lines; } package Node689350; # Call as __PACKAGE__->new( { dict => [ ... ] } ) # where "..." is every word in the dictionary sub new { my $class = shift; my $self = shift; foreach my $w ( @{ $self->{dict} } ) { my $orig = $w; # save the original word (for output) # turn to all lowercase letters $w =~ tr/A-Z/a-z/; $w =~ tr/a-z//cd; # translate to the number that makes this word $w =~ tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999 +/; # Store this word as one possible result of this number push @{ $self->{word_for}{$w} }, $orig; } # I don't need the dictionary anymore delete $self->{dict}; return bless $self, $class; } # This function will NOT produce individual digits as "words" sub words_only { my $self = shift; return $self->words_for_phone( @_ ); } sub words_for_phone { my $self = shift; my $out = []; # If this was called from 'words_only', don't produce an individua +l digit my $words_only; { no warnings 'uninitialized'; my @call = caller 1; $words_only = ( $call[3] eq __PACKAGE__ . '::words_only' ); } while ( defined( my $phone = shift ) ) { # Turn the input phone number to digits only $phone =~ tr/0-9//cd; # Every truncation, longest first my @subphones = reverse map { substr $phone, 0, $_ } 1 .. length $phone; # Every truncation that has an associated word my @word_subphones = grep { exists $self->{word_for}{$_} } @subphones; # If there were no words found, # and we don't have to stick to words only, if ( ! @word_subphones && ! $words_only ) { # Get the first digit as a "word" my ( $w, $new_phone ) = ( $phone =~ m{ \A (.) (.*) \z }xms + ); # If this is the last digit, return this as the only produ +ction if ( ! length $new_phone ) { return [ $w ]; } # Make the rest of the number into words (NOT DIGITS) foreach my $more ( @{ $self->words_only( $new_phone ) } ) +{ push @{ $out }, $w . ' ' . $more; } } SUBPHONE: foreach my $n ( @word_subphones ) { # This is the part of the number we didn't translate my $new_phone = substr $phone, length $n; # If there's no more left to translate, don't try if ( ! length $new_phone ) { push @{ $out }, @{ $self->{word_for}{$n} }; next SUBPHONE; } # for every translation we have, foreach my $w ( @{ $self->{word_for}{$n} } ) { # look for translations of the remaining number foreach my $more ( @{ $self->words_for_phone( $new_pho +ne ) } ) { push @{ $out }, $w . ' ' . $more; } } } } return $out; }

The solution itself is absolutely correct. The I/O loop at the beginning might be broken, though. I didn't test that part. Rather, I tested the module with Test::More and wrote the I/O loop as an afterthought. Here are my tests:

package main; use Test::More; my $sample_dict = <<'END_SAMPLE_DICT'; an blau Bo" Boot bo"s da Fee fern Fest fort je jemand mir Mix Mixer Name neu o"d Ort so Tor Torf Wasser END_SAMPLE_DICT ; my @sample_phones = grep { length } split /\s+/, q{ 112 5624-82 4824 0721/608-4067 10/783--5 1078-913-5 381482 04824 }; my $sample_output = <<'END_SAMPLE_OUTPUT'; 5624-82: mir Tor 5624-82: Mix Tor 4824: Torf 4824: fort 4824: Tor 4 10/783--5: neu o"d 5 10/783--5: je bo"s 5 10/783--5: je Bo" da 381482: so 1 Tor 04824: 0 Torf 04824: 0 fort 04824: 0 Tor 4 END_SAMPLE_OUTPUT ; if (0) { my %output_for = map { $_ => [] } @sample_phones; foreach my $output_line ( split /\n/, $sample_output ) { my ( $phone, $word ) = split /: /, $output_line; push @{ $output_for{ $phone } }, $word; } my $x = Node689350->new( { dict => [ split /\n/, $sample_dict ] } +); plan 'tests' => scalar @sample_phones; foreach my $phone ( @sample_phones ) { my $phone_words = $x->words_for_phone( $phone ); is_deeply( [ sort @{ $phone_words } ], [ sort @{ $output_for{ $phone } } ], "phone $phone" ); } } else { my @dictionary = slurp( 'dictionary.txt' ); my @input = slurp( 'input.txt' ); my @output = slurp( 'output.txt' ); my %output_for = map { $_ => [] } @input; foreach my $output_line ( @output ) { my ( $phone, $word ) = split /: /, $output_line; push @{ $output_for{ $phone } }, $word; } # This appears twice for some reason $output_for{ '2' } = [ '2' ]; plan 'tests' => scalar @input; my $x = Node689350->new( { dict => \@dictionary } ); foreach my $phone ( @input ) { my $phone_words = $x->words_for_phone( $phone ); is_deeply( [ sort @{ $phone_words } ], [ sort @{ $output_for{ $phone } } ], "phone $phone" ); } }

I started with the short disambiguating sample material and then moved on to the main input. I found that '2' as an input appears twice, so I had to "fix" my testing code to account for that (I hardcoded the solution for '2'). I found also that I had one last bug to fix for cases that produce words but end in the digit 0 (that's 7 of the 1000 inputs).

I did this without reading any of the other solutions. I wasn't going to work on this at all until I saw the one from Anonymous Monk was so short. Seeing that, I figured it couldn't be quite as time-consuming as I was thinking.

I wasn't really trying to make it short or especially efficient—just correct. Before I started I had this idea that I would profile it when I was done and make it really fast, but now I'd rather go to bed.

Update: I've fixed the problem that BrowserUK pointed out, and I made a performance tweak based on profiling. I was surprised to discover that more time was spent in new() than in all 7000 calls to words_to_phone() put together. I fixed that by changing "s/(.)/$num_for{ $1 }/g" to "tr/ejnqrwxdsyftamcivbkulopghz/01112223334455666777888999/;" thereby sacrificing maintainability for speed. It's a big speed difference, however. Run time went from over 1 second to about a half a second.


In reply to Re: One for the weekend: challenge by kyle
in thread One for the weekend: challenge by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    [Corion]: Meh. I need to find myself a better "programming" feed than r/programming (which is just HackerNews reposts and advertisements, very little code). Maybe I should select a list of links that I liked on r/programming and then google for ...
    [Corion]: ... an aggregator site that also listed (ideally) all of these links once. And maybe also have an exclude list to blacklist some of the most spammy links that the site may have never mentioned at all
    [marto]: I gave up reading HN at all, and feel much better for it :P
    [Corion]: Now, how to best automate that Google search ... :-)
    [Corion]: marto: Yeah, I'm also short of not reading it at all. I already stopped reading it in the morning because it gave me a foul mood.
    [Corion]: The good posts on HN are the non-computing posts. Most of the other stuff is maybe relevant to you if you are 20 and live in Silicon Valley...
    [Corion]: Two attributes that don't describe me.
    [Corion]: Maybe I should also write a curator for HN or simply not read it anymore, like you do.

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (8)
    As of 2017-07-24 13:24 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      I came, I saw, I ...
























      Results (354 votes). Check out past polls.