Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: improving the efficiency of a script

by Zaxo (Archbishop)
on Jun 18, 2006 at 16:48 UTC ( [id://556124]=note: print w/replies, xml ) Need Help??


in reply to improving the efficiency of a script

If this is going to be used a lot, I'd either stuff the dictionary file into a database, or else construct an index to the offsets and sizes of initial letter sections of the dictionary file.

Assuming the dictionary file is alphabetically sorted, you don't need to slurp the whole file into an array. That is a large chunk of memory for a million words. Allocations that size will slow you painfully if you are driven into swap.

Try just building an array with the a's, shuffling, and taking the first hundred elements. Then discard the a's and replace with the b's, all in a while loop that only reads one line at a time.

You don't need a loop to pick the first hundred elements of an array. A slice will do,

@array[0..99]
and is much faster.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re^2: improving the efficiency of a script
by Limbic~Region (Chancellor) on Jun 19, 2006 at 00:03 UTC
    Zaxo,
    This is very similar to the idea I had. I discovered that compiling the offsets using DBM::Deep was extremely slow, but was fast for subsequent runs. This also has the advantage of not requiring the dictionary file to be sorted.
    #!/usr/bin/perl use strict; use warnings; use DBM::Deep; open(my $dict, '<', 'words.raw') or die "Unable to open 'words.raw' fo +r reading: $!"; my $db = DBM::Deep->new("offsets.db"); build_db($db, $dict) if ! scalar keys %$db; for my $char ('a' .. 'z') { for (1 .. 100) { print get_rand_word($db, $char, $dict); } } sub build_db { my ($db, $dict) = @_; my $pos = tell $dict; while ( <$dict> ) { my $char = substr($_, 0, 1); push @{$db->{$char}}, $pos; $pos = tell $dict; } } sub get_rand_word { my ($db, $char, $dict) = @_; my $offset = $db->{$char}[rand @{$db->{$char}}]; seek $dict, $offset, 0; my $word = <$dict>; return $word; }
    Other options include Storable and DBD::SQLite if a real RDBMS isn't available.

    Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://556124]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-06-16 21:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.