Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Words in Words

by choroba (Abbot)
on Sep 30, 2011 at 19:53 UTC ( #928904=note: print w/ replies, xml ) Need Help??


in reply to Words in Words

Did I understand points 1) and 2) correctly? This script finishes a list of almost 640,000 entries in less then a minute (50sec), adding conditions 3) and 4) should be really easy.

#!/usr/bin/perl use feature 'say'; use warnings; use strict; my $file = '/etc/dictionaries-common/words'; open my $IN, '<', $file or die "$!"; my %words; while (my $word = <$IN>) { chomp $word; undef $words{$word}; } for my $word (keys %words) { my $length = length $word; my %found; # report each occurence just once for my $pos (0 .. $length - 1) { my $skip_itself = ! $pos; for my $len (1 .. $length - $pos - $skip_itself) { my $subword = substr($word, $pos, $len); next if exists $found{$subword}; if (exists $words{$subword}) { say "$subword in $word"; undef $found{$subword}; } } } }
Update: if I ommit the "just once" condition, the script finishes in 40 secs on my Mac Mini.


Comment on Re: Words in Words
Download Code
Re^2: Words in Words
by sarchasm (Acolyte) on Sep 30, 2011 at 21:45 UTC

    I may not have clarified the specs as well as I had hoped.

    The results should be a distinct list of the words in the list that can be found contained within another word in the list taking into consideration the exclusions I listed.

    When I run this code, it returns words that are not contained in the wordlist. They are just pieces of the word.

    I will say that it is pretty fast though and I may be able to use this technique to get what I am looking for.

    Thank you.
      When I run this code, it returns words that are not contained in the wordlist. They are just pieces of the word.
      Have you changed the path to the input file? The code should only print words from the list!

        I did change the path but it looks like the code is substringing words through itself. Here is what I get:

        p in pogy's po in pogy's pogy in pogy's o in pogy's og in pogy's g in pogy's gy in pogy's gy's in pogy's y in pogy's y's in pogy's s in pogy's

        The process needs to look for the word "pogy's" within another word in the list like "apogy's"

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://928904]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (9)
As of 2014-08-28 11:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (259 votes), past polls