Re: Opening random files (with bias) based on File::Stat information.

by xdg (Monsignor)
on Mar 19, 2006 at 14:05 UTC

in reply to Opening random files (with bias) based on File::Stat information.

For the bias, something you might consider is this:

  • Sort files by newest to oldest
  • Generate a random number between 0 and 1
  • Invert that number against a bounded cumulative probability distribution function
  • Scale the inverse to the length of your list
  • Pick a file using the scaled inverse as the index

If you pick a distribution that is weighted towards 0, you'll wind up picking newer files. Note: this isn't technically weighting by time -- it's biasing towards certain array slots, irrespective of whether those slots are close in access time or far apart. However, that may be sufficient for your particular application.

A good distribution for this may be the Kumaraswamy, which is bounded between 0 and 1 and has a closed form that is easy to invert. By changing the two input parameters, you'll get different shapes, including ones that bias towards 0. (You'll have to try graphing some PDF's and see what you like.)

Here's an example of how it could be used to bias in the way I described:

use strict; use warnings; my $param_a = 1.5; my $param_b = 6; my @array = ( 1 .. 100 ); sub invK { my ($F, $Ka, $Kb) = @_; return ( 1 - ( 1 - $F )**( 1 / $Kb ) )**( 1 / $Ka ); } for ( 1 .. 20 ) { my $pick = int( invK( rand(), $param_a, $param_b ) * @array ); print "$pick\n"; }

A test run gave this: 7 11 12 26 18 27 10 30 6 3 28 2 35 7 29 40 26 15 3 44


Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Node Type: note [id://537744]
