Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Estimating Vocabulary

by belg4mit (Prior)
on Mar 27, 2002 at 07:21 UTC ( #154607=note: print w/replies, xml ) Need Help??


in reply to Estimating Vocabulary

Well here's an alternate, a complete waste of cycles as it scales linearly with the number of words returned, OTOH it is not bounded by the size of the dictionary. (As is) It can also return duplicates, yada yada yada.
my(@lines, $line); open(FILE, shift) || die; until( scalar @lines == $ARGV[0] ){ seek(FILE, 0, $. = 0); rand($.) < 1 && ($line = $_) while <FILE>; push(@lines, $line); } print @lines, "wc -l could have told you this is $. words\n";
It's based on "How do I select a random line from a file?" in perlfaq5. I'd be interested in seeing if anybody else has a better means of extending this algorythm to report multiple entries.

--
perl -pe "s/\b;([st])/'\1/mg"

Replies are listed 'Best First'.
Re: Re: Estimating Vocabulary
by I0 (Priest) on Mar 28, 2002 at 00:41 UTC
    my(@lines, $line); open(FILE, shift) || die; 1 while <FILE>; $line=$.; seek(FILE, 0, $. = 0); rand($line-$.) < $ARGV[0]-@lines && push(@lines,$_) while <FILE>; print @lines, "wc -l could have told you this is $. words\n";
      UPDATE:Excellent!

      WAS: That does not appear to work, I ask for one line and get 13-18 lines... It is also heavily weighted towards the Zs

      --
      perl -pe "s/\b;([st])/'\1/mg"

        Are you sure? It's working for me with only a small bias towards the Zs

        Update: Apparently, the observed bias was mostly an artifact of small sample size

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://154607]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2020-01-26 02:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?