in reply to Frequency Analysis Of A Subset Of A File
This will print a pretty good approximation to a randomly distributed 10% of the lines in any file, regardless of its size:
C:\test>wc -l 986831-01.dat 268 986831-01.dat C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 33 C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 26 C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 32 C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 24
Once you have randomly selected X% of the lines in the file, you only need randomly select X% of the characters (pairs/triples) in each of those lines to satisfy your overall goal.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Frequency Analysis Of A Subset Of A File
by Limbic~Region (Chancellor) on Apr 24, 2013 at 18:51 UTC | |
by BrowserUk (Patriarch) on Apr 24, 2013 at 20:45 UTC |
In Section
Seekers of Perl Wisdom