Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^3: Words in Words

by BrowserUk (Pope)
on Sep 30, 2011 at 21:53 UTC ( #928934=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Words in Words
in thread Words in Words

Sorry, but only comparing words with those that come after it alphabetically doesn't work.

For example, the work 'call' appears in all these words that precede it alphabetically:

abiogenically abiotically academically acerbically achromatically acoustically acrobatically acronymically acrostically actinically adiabatically adrenergically aerobically aerodynamically aeronautically aesthetically agonistically agronomically alchemically alcoholically algebraically algorithmically allegorically allopatrically allosterically allotypically alogically alphabetically altruistically amitotically anacoluthically anaerobically anagogically analogically analytically anaphorically anarchically anatomically anchoritically anecdotically anemically anesthetically angelically animatronically anisotropically anodically antibiotically antically antidromically antigenically antiseptically antithetically aoristically apathetically aperiodically aphetically aphoristically apically apocalyptically apodictically apolitically apologetically apomictically apoplectically aposematically apotropaically aquatically archaically arctically arithmetically aromatically arthritically artistically ascetically aseptically asthmatically astrologically astronautically astronomically astrophysically asymmetrically asymptotically asyndetically atavistically atheistically athletically atmospherically atomically atomistically atypically authentically autistically autocratically autographically automatically autonomically autotrophically axenically axiologically axiomatically ballistically barbarically barometrically basically bathetically bathymetrically beatifically biblically biochemically biogenetically biographically biologically biomechanically birdcall birdcalls bolometrically bombastically botanically buccally bucolically caecally call

And the word 'the' appears in 1300 words that precede it in my 178,000 word dictionary.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^3: Words in Words
Download Code
Re^4: Words in Words
by sarchasm (Acolyte) on Sep 30, 2011 at 22:03 UTC

    This process only needs to identify 1 word in the list that meets the criteria resulting in a unique list of the words that were contained within other words.

    I suppose there would still be a problem if this process doesn't look at the words that preceed each word because something could be missed.

    I have this code working and it sure is fast at producing a list of words. It just needs to return 1 word instead of all of the words that match and ensure that it searches the entire list.

    Thank you!
      This process only needs to identify 1 word in the list that meets the criteria resulting in a unique list of the words that were contained within other words.

      Then add a single line to my solution above and it should reduce the time taken by roughly a factor of 10. Ie. A projected 1 hour:

      #! perl -slw use strict; my @words = do{ local @ARGV = 'words.txt'; <> }; chomp @words; my $all = join ' ', @words; my $start = time; my $n = 0; for my $i ( @words ) { for my $j ( $all =~ m[ ([^ ]*$i[^ ]*) ]g ) { next if $j eq $i or $j eq "${i}s" or $j eq "${i}'s"; print "$j contains $i"; last; ## Added } } printf STDERR "Took %d seconds for %d words\n", time() - $start, scalar @words;

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Words in Words
by Lotus1 (Chaplain) on Sep 30, 2011 at 22:43 UTC
    Sorry, but only comparing words with those that come after it alphabetically doesn't work.

    True, but in the solution I posted I sort by length after the alphabetical sort. The alphabetical sort is probably overkill but needed for this particular solution.

    chomp (my @words = sort { length($a) <=> length($b) } sort <DATA>);

      It looks like both solutions will work!

      One thing I just realized from your post about sorting is that you only need to look at words that are longer than the current word (which you are sortof doing). This means that as the program runs, it actually becomes faster at finding the results.

      I ran each program for 1 minute and BrowserUk's code produced 320 records. Lotus1's code produced 150. Even though your code appears to run slower I imagine performance will improve the longer the process runs because it will have fewer records to look through each time.

      I will let the programs run over the weekend to see what I get.

      Thank you all for your help. I learned a lot from your examples and suggestions!

        Another tweak should improve performance again:

        #! perl -slw use strict; my @words = do{ local @ARGV = 'words.txt'; <> }; chomp @words; my $all = join ' ', @words; my $start = time; for my $i ( @words ) { while( $all =~ m[ ([^ ]*$i[^ ]*) ]g ) { my $j = $1; next if $j eq $i or $j eq "${i}s" or $j eq "${i}'s"; print "$j contains $i"; last; ## Added } } printf STDERR "Took %d seconds for %d words\n", time() - $start, scalar @words;

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        Update: Evidently this is a step too far as it produces the wrong results. It could (probably) be fixed, but it will never beat choroba's solution below.

        My final offering. Combining Lotus1's sort by length with my big-string approach and this really flies, beating my previous best by an order of magnitude:

        Ignore!

        #! perl -slw use strict; my @words = sort{ length($a) <=> length($b) } do{ local @ARGV = 'words.txt'; <> }; chomp @words; my $start = time; my $all = join ' ', @words; study $all; my @offsets; for my $l ( 1 .. 20 ) { push @offsets, $all =~ m[ ([^ ]{$l}) ] ? $-[0] : $offsets[-1]; } for my $i ( @words ) { while( substr( $all, $offsets[ length( $i ) +1 ] ) =~ m[ ([^ ]*$i[^ ]*) ]g ) { my $j = $1; next if $j eq $i or $j eq "${i}s" or $j eq "${i}'s"; print $i; last; } } printf STDERR "Took %d seconds for %d words\n", time() - $start, scalar @words;

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://928934]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-07-11 12:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (224 votes), past polls