Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^8: Words in Words

by sarchasm (Acolyte)
on Oct 02, 2011 at 20:28 UTC ( #929174=note: print w/ replies, xml ) Need Help??


in reply to Re^7: Words in Words
in thread Words in Words

WOW!

I ran one of the other scripts and it took just under 24 hours to complete and I didn't get the answer I was expecting.

Your script ran in 40 seconds and gave me exactly what I was looking for!

Would you be willing to explain how this works? I get the declaration of the hash, the while loop to load the file (not sure what "undef $words{$word};" does) but the rest is pure magic!

Thank you so much for putting together this solution...I am truely blown away.

I tried to do this using T-SQL when I first encountered the problem but that was taking forever. Then I "tried" to use PERL but had way too many questions to get it to do what I needed. Your solution is awesome!

Thanks again!


Comment on Re^8: Words in Words
Re^9: Words in Words
by choroba (Abbot) on Oct 02, 2011 at 20:55 UTC
    OK. In the while loop, I create a hash whose keys are the words from the list (there is no value, that's why the undef).

    Then, I go through the words one by one. For each word, I try all the positions and all possible lengths of its subwords (and I skip the maximal length at position 0, because that would breake the rule #2). For each subword, I do nothing if it has already been printed out (each word should be reported just once). I do nothing if rules #3 or #4 apply. Otherwise, I check whether the subword is itself on the list of words. If it is, I report it and book it as such. And that's it.

    The basic idea was this: Comparing each word to all other words would take ages. There would be many comparisons of words that are totally incompatible. How can I reduce the number of comparisons? I do not need all the words, I only need those that are possible for the given word.

    As I read the code know, I think it might be optimized a bit further. Instead of caching the reported subwords, you can cache the tested ones (i.e. move the undef three lines up, before the "if"). %reported should be renamed to %checked then.

      Got it!

      The only piece in the code I am wondering about is where you are checking for rules 3 and 4 and use "$subword . q{s}..."

      I know what it is doing but where did the "q" come from?

      Thank you!

        http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators
        Customary Generic Meaning Interpolates '' q{} Literal no "" qq{} Literal yes `` qx{} Command yes* qw{} Word list no // m{} Pattern match yes* qr{} Pattern yes* s{}{} Substitution yes* tr{}{} Transliteration no (but see below) y{}{} Transliteration no (but see below) <<EOF here-doc yes* * unless the delimiter is ''.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929174]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2014-07-24 05:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (157 votes), past polls