Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re: Words in Words

by choroba (Chancellor)
on Sep 30, 2011 at 19:53 UTC ( #928904=note: print w/replies, xml ) Need Help??

in reply to Words in Words

Did I understand points 1) and 2) correctly? This script finishes a list of almost 640,000 entries in less then a minute (50sec), adding conditions 3) and 4) should be really easy.
#!/usr/bin/perl use feature 'say'; use warnings; use strict; my $file = '/etc/dictionaries-common/words'; open my $IN, '<', $file or die "$!"; my %words; while (my $word = <$IN>) { chomp $word; undef $words{$word}; } for my $word (keys %words) { my $length = length $word; my %found; # report each occurence just once for my $pos (0 .. $length - 1) { my $skip_itself = ! $pos; for my $len (1 .. $length - $pos - $skip_itself) { my $subword = substr($word, $pos, $len); next if exists $found{$subword}; if (exists $words{$subword}) { say "$subword in $word"; undef $found{$subword}; } } } }
Update: if I ommit the "just once" condition, the script finishes in 40 secs on my Mac Mini.

Replies are listed 'Best First'.
Re^2: Words in Words
by sarchasm (Acolyte) on Sep 30, 2011 at 21:45 UTC

    I may not have clarified the specs as well as I had hoped.

    The results should be a distinct list of the words in the list that can be found contained within another word in the list taking into consideration the exclusions I listed.

    When I run this code, it returns words that are not contained in the wordlist. They are just pieces of the word.

    I will say that it is pretty fast though and I may be able to use this technique to get what I am looking for.

    Thank you.
      When I run this code, it returns words that are not contained in the wordlist. They are just pieces of the word.
      Have you changed the path to the input file? The code should only print words from the list!

        I did change the path but it looks like the code is substringing words through itself. Here is what I get:

        p in pogy's po in pogy's pogy in pogy's o in pogy's og in pogy's g in pogy's gy in pogy's gy's in pogy's y in pogy's y's in pogy's s in pogy's

        The process needs to look for the word "pogy's" within another word in the list like "apogy's"

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://928904]
[jedikaiti]: 'elo Monks
[perldigious]: https://www.xkcd. com/1876/
[perldigious]: https://www.xkcd. com/1877/
[perldigious]: Good times. :-)

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2017-08-16 15:42 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (269 votes). Check out past polls.