Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Words in Words

by choroba (Cardinal)
on Sep 30, 2011 at 19:53 UTC ( [id://928904]=note: print w/replies, xml ) Need Help??


in reply to Words in Words

Did I understand points 1) and 2) correctly? This script finishes a list of almost 640,000 entries in less then a minute (50sec), adding conditions 3) and 4) should be really easy.
#!/usr/bin/perl use feature 'say'; use warnings; use strict; my $file = '/etc/dictionaries-common/words'; open my $IN, '<', $file or die "$!"; my %words; while (my $word = <$IN>) { chomp $word; undef $words{$word}; } for my $word (keys %words) { my $length = length $word; my %found; # report each occurence just once for my $pos (0 .. $length - 1) { my $skip_itself = ! $pos; for my $len (1 .. $length - $pos - $skip_itself) { my $subword = substr($word, $pos, $len); next if exists $found{$subword}; if (exists $words{$subword}) { say "$subword in $word"; undef $found{$subword}; } } } }
Update: if I ommit the "just once" condition, the script finishes in 40 secs on my Mac Mini.

Replies are listed 'Best First'.
Re^2: Words in Words
by sarchasm (Acolyte) on Sep 30, 2011 at 21:45 UTC

    I may not have clarified the specs as well as I had hoped.

    The results should be a distinct list of the words in the list that can be found contained within another word in the list taking into consideration the exclusions I listed.

    When I run this code, it returns words that are not contained in the wordlist. They are just pieces of the word.

    I will say that it is pretty fast though and I may be able to use this technique to get what I am looking for.

    Thank you.
      When I run this code, it returns words that are not contained in the wordlist. They are just pieces of the word.
      Have you changed the path to the input file? The code should only print words from the list!

        I did change the path but it looks like the code is substringing words through itself. Here is what I get:

        p in pogy's po in pogy's pogy in pogy's o in pogy's og in pogy's g in pogy's gy in pogy's gy's in pogy's y in pogy's y's in pogy's s in pogy's

        The process needs to look for the word "pogy's" within another word in the list like "apogy's"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://928904]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-19 03:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found