Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: improving the efficiency of a script

by johngg (Canon)
on Jun 19, 2006 at 22:44 UTC ( [id://556301]=note: print w/replies, xml ) Need Help??


in reply to Re: improving the efficiency of a script
in thread improving the efficiency of a script

It doesn't necessarily require two passes.

It seems reasonable to assume that a dictionary file is sorted case-insensitively (and simple to make it so if it isn't already). Then you can read the file building up a list of words beginning with letter 'a'. As soon as you come across the first word beginning with 'b', make your random selection of 100 words beginning with 'a'and then discard the word list and re-initialise it with the 'b' word just read. Repeat until you've gone through the alphabet.

That way the job is done with a single pass but you don't have to keep the entire dictionary in memory, just one letter's worth.

use strict; use warnings; my $howMany = shift || 100; my $done = 0; my $rxValidWord = qr{^([A-Za-z])[-a-z]+}; my $totalWordCt = 0; my $validWordCt = 0; my %letterCts = (); my @letters = (q{a} .. q{z}); my $currentLetter = q{}; my $rxCurrent = getNextLetter(); my @wordCache = (); my $dictFile = q{../Web2}; open my $dictFH, q{<}, $dictFile or die qq{open: $dictFile: $!\n}; while(<$dictFH>) { $totalWordCt ++; next if $done; next unless /$rxValidWord/; my $initLetter = $1; $validWordCt ++; chomp; if($initLetter =~ $rxCurrent) { push @wordCache, $_; } else { reportForLetter(); } } reportForLetter() if $currentLetter; close $dictFH or die qq{close: $dictFile: $!\n}; print qq{\n}, qq{Total words in dictionary - $totalWordCt\n}, qq{Valid words processed - $validWordCt\n}; sub generateSlice { my ($total, $howMany) = @_; my $rlPool = [0 .. ($total - 1)]; return $rlPool if $total <= $howMany; my $rlSlice = []; for (1 .. $howMany) { push @$rlSlice, splice @$rlPool, int rand scalar @$rlPool, 1; } return [sort {$a <=> $b} @$rlSlice]; } sub getNextLetter { if(@letters) { $currentLetter = shift @letters; my $charClass = q{[} . $currentLetter . uc $currentLetter . q{]}; return qr{$charClass}; } else { $currentLetter = q{}; $done ++; return 0; } } sub reportForLetter { my $savedWord = $_; print qq{\n}, qq{Found @{[scalar @wordCache]} words }, qq{for letter $currentLetter\n}; my $rlSlice = generateSlice(scalar @wordCache, $howMany); my @randomWords = @wordCache[@$rlSlice]; print qq{$howMany words selected at random\n}; print qq{ $_\n} for @randomWords; @wordCache = ($savedWord); $rxCurrent = getNextLetter(); }

When run to look for 5 random word it produces this

Found 17059 words for letter a 5 words selected at random accede acrimoniously Agade amenable antibasilican Found 11025 words for letter b 5 words selected at random banuyo basichromatin beclothe Beethovenish bilamellate Found 19859 words for letter c 5 words selected at random celebratedness centrical connectival contraflexure copepodan Found 10841 words for letter d 5 words selected at random deuteroscopy didymous digit disinfector dressership Found 8714 words for letter e 5 words selected at random electrometallurgy eremic erethistic evident exhortatively Found 6852 words for letter f 5 words selected at random fermorite fetal flinthearted foder forwarding Found 6834 words for letter g 5 words selected at random geranial goldentop gonimous gude guttiferous Found 8982 words for letter h 5 words selected at random haffle hierophant homeotypical homologate hydrocyanate Found 8785 words for letter i 5 words selected at random ignobleness incommensurateness inestimability infundibular isotonic Found 1569 words for letter j 5 words selected at random jacketwise jeddock Jesse jiffle jilter Found 2218 words for letter k 5 words selected at random kefir kneelingly kongoni kotwalee kyphosis Found 6231 words for letter l 5 words selected at random lablab ladder leeches loreal lyopomatous Found 12529 words for letter m 5 words selected at random marquisal mnemonical moribundity murgeon murine Found 6746 words for letter n 5 words selected at random nastika nominee nondefinition nonemploying Normanist Found 7833 words for letter o 5 words selected at random officious oligarch ostiate overaccuracy overlanguaged Found 24409 words for letter p 5 words selected at random palustral Pherophatta precausation presubstitution probaseball Found 1150 words for letter q 5 words selected at random quadrual queery quidditative quinaldinic quira Found 9594 words for letter r 5 words selected at random racemate ringleaderless risper roadite rotundness Found 25029 words for letter s 5 words selected at random scrapman silvertop stepmotherly sticheron subexcite Found 12907 words for letter t 5 words selected at random tailward terzina thrimble throughganging translatable Found 16385 words for letter u 5 words selected at random underfortify underminister uninjurious uniunguiculate upcloser Found 3409 words for letter v 5 words selected at random velveted victual vilely vivaciously volcanist Found 3925 words for letter w 5 words selected at random whelklike winningness wireworking wiseacre wusp Found 383 words for letter x 5 words selected at random xanthine xanthopsin Xenomi xerotherm xyst Found 669 words for letter y 5 words selected at random ya yardkeep yearnfulness yentnite ypsiliform Found 947 words for letter z 5 words selected at random Zalophus zedoary zelatrice zincotype Zoophagineae Total words in dictionary - 234936 Valid words processed - 234884

Cheers,

JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://556301]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-06-19 07:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.