<?xml version="1.0" encoding="windows-1252"?>
<node id="928897" title="Re: Words in Words" created="2011-09-30 14:57:58" updated="2011-09-30 14:57:58">
<type id="11">
note</type>
<author id="171588">
BrowserUk</author>
<data>
<field name="doctext">
&lt;blockquote&gt;&lt;i&gt;&lt;/i&gt;&lt;/blockquote&gt;

&lt;p&gt;Try this. I project that it should complete your 410 billion comparisons in a little under 10 hours.

&lt;P&gt;The main attempt at efficiency here is to invoke the regex once in global mode (/g) for each word, against a single large string containing all the words and have it return all the matches. It then filters just the matching ones for your specific exclusions.
&lt;code&gt;
#! perl -slw
use strict;

my @words = do{ local @ARGV = 'words.txt'; &lt;&gt; };
chomp @words;

my $all = join ' ', @words;

my $start = time;
my $n = 0;
for my $i ( @words ) {

    for my $j ( $all =~ m[ ([^ ]*$i[^ ]*) ]g ) {
        next
            if $j eq $i
            or $j eq "${i}s"
            or $j eq "${i}'s";
#        print "$j contains $i";
    }
}

printf STDERR "Took %d seconds for %d words\n",
    time() - $start, scalar @words;
&lt;/code&gt;


&lt;div class="pmsig"&gt;&lt;div class="pmsig-171588"&gt;
&lt;hr /&gt;
&lt;font size=1 &gt;
&lt;div&gt;Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.&lt;/div&gt;
&lt;div&gt;"Science is about questioning the status quo. Questioning authority". &lt;/div&gt;
&lt;div&gt;In the absence of evidence, opinion is indistinguishable from prejudice.&lt;/div&gt;
&lt;/font&gt;

&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
928877</field>
<field name="parent_node">
928877</field>
</data>
</node>
