Re: Challenge: "Words" In A String

in reply to Challenge: "Words" In A String

Very rough draft (pseudocode)

sub wordify($string) {
    my $results = [];
    $min_end = length($string)
    for $pos (0 .. $min_end) {
        for $len (1 .. length($string)) {
            $word = substr($string, $pos, $len)
            if exists($dictionary{$word}) {
                $min_end = $pos+$len if $pos+$len < $min_end
                $prefix = substr($string, 0, $pos) . " " . $word # nee
+ds special treatment for $pos = 0
                $endings = wordify(substr($string, $pos+1))
                push(@$results, map("$prefix $_", @$endings))
            }
        }
    }
    return [$string] unless @$results 
    return $results
}
[download]

Hope I don't get it all wrong, it's quite late here in Europe. I'd definitely use dynamic programming to save the partial results.

Comment on Re: Challenge: "Words" In A String Download Code

Replies are listed 'Best First'.
Re^2: Challenge: "Words" In A String by pjotrik (Friar) on Sep 19, 2008 at 10:21 UTC
All right, here goes the code. I've kept it simple, using trie instead of hash as ikegami does would surely improve performance. #!/usr/bin/perl use warnings; use strict; my %dictionary; open(my $dict_file, '<', '2of12inf.txt') or die ("Can't open dictionar +y: $!"); while (<$dict_file>) { $_ = lc($_); tr/a-z//dc; $dictionary{$_} = 1; } my @partials; while (<DATA>) { $_ = lc($_); tr/a-z//dc; @partials = (); $partials[0] = ['']; my $results = wordify($_); print "$_:\n"; print "$_\n" for @$results; } sub wordify { my ($string) = @_; my $results = []; my $min_end = length($string); for (my $pos = 0; $pos < $min_end; $pos++) { for my $len (1 .. length($string) - $pos) { my $word = substr($string, $pos, $len); if (exists $dictionary{$word}) { $min_end = $pos+$len if $pos+$len < $m +in_end; my $prefix = ''; $prefix = '[' . substr($string, 0, $po +s) . '] ' unless $pos == 0; $prefix .= $word; my $rest = substr($string, $pos + $len +); wordify($rest) unless (defined $partia +ls[length($rest)]); my $endings = $partials[length($rest)] +; push(@$results, map("$prefix $_", @$en +dings)); } } } $results = ["[$string]"] unless @$results; $partials[length($string)] = $results; return $results; } __DATA__ penisland zatxtaz xapenx [download] Giving the result (brackets denote the non-word fragments) `penisland: pen is la [nd] pen is land pen is [l] an [d] pen is [l] and pen island penis la [nd] penis land penis [l] an [d] penis [l] and [p] en is la [nd] [p] en is land [p] en is [l] an [d] [p] en is [l] and [p] en island zatxtaz: [z] at [xtaz] xapenx: [x] ape [nx] [xa] pen [x] [xap] en [x]` [download] and for the `2of4brif` dictionary `penisland: pen is land pen is [l] an [d] pen is [l] and pen island penis land penis [l] an [d] penis [l] and zatxtaz: [z] at [x] ta [z] xapenx: [x] ape [nx] [xa] pen [x]` [download]	[reply] [d/l] [select]
Re^3: Challenge: "Words" In A String by Limbic~Region (Chancellor) on Sep 19, 2008 at 13:46 UTC
pjotrik, I like the way you distinguish between word substrings and non-word substrings. Unfortunately, the order isn't correct. Given the way you return the results and print them, adding an order wouldn't be difficult. I haven't tested this against any large sample data with potential cases but it looks good from eyeballing it :-) Cheers - L~R	[reply]

In Section Seekers of Perl Wisdom