Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re^8: Words in Words

by sarchasm (Acolyte)
on Oct 02, 2011 at 20:28 UTC ( #929174=note: print w/replies, xml ) Need Help??

in reply to Re^7: Words in Words
in thread Words in Words


I ran one of the other scripts and it took just under 24 hours to complete and I didn't get the answer I was expecting.

Your script ran in 40 seconds and gave me exactly what I was looking for!

Would you be willing to explain how this works? I get the declaration of the hash, the while loop to load the file (not sure what "undef $words{$word};" does) but the rest is pure magic!

Thank you so much for putting together this solution...I am truely blown away.

I tried to do this using T-SQL when I first encountered the problem but that was taking forever. Then I "tried" to use PERL but had way too many questions to get it to do what I needed. Your solution is awesome!

Thanks again!

Replies are listed 'Best First'.
Re^9: Words in Words
by choroba (Chancellor) on Oct 02, 2011 at 20:55 UTC
    OK. In the while loop, I create a hash whose keys are the words from the list (there is no value, that's why the undef).

    Then, I go through the words one by one. For each word, I try all the positions and all possible lengths of its subwords (and I skip the maximal length at position 0, because that would breake the rule #2). For each subword, I do nothing if it has already been printed out (each word should be reported just once). I do nothing if rules #3 or #4 apply. Otherwise, I check whether the subword is itself on the list of words. If it is, I report it and book it as such. And that's it.

    The basic idea was this: Comparing each word to all other words would take ages. There would be many comparisons of words that are totally incompatible. How can I reduce the number of comparisons? I do not need all the words, I only need those that are possible for the given word.

    As I read the code know, I think it might be optimized a bit further. Instead of caching the reported subwords, you can cache the tested ones (i.e. move the undef three lines up, before the "if"). %reported should be renamed to %checked then.

      Got it!

      The only piece in the code I am wondering about is where you are checking for rules 3 and 4 and use "$subword . q{s}..."

      I know what it is doing but where did the "q" come from?

      Thank you!
        Customary Generic Meaning Interpolates '' q{} Literal no "" qq{} Literal yes `` qx{} Command yes* qw{} Word list no // m{} Pattern match yes* qr{} Pattern yes* s{}{} Substitution yes* tr{}{} Transliteration no (but see below) y{}{} Transliteration no (but see below) <<EOF here-doc yes* * unless the delimiter is ''.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929174]
[choroba]: I'm in the lowest category, too
[choroba]: but in the country, that's in fact still a lot of money

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2017-04-27 08:36 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (501 votes). Check out past polls.