Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Get random unique lines from file

by BrowserUk (Patriarch)
on Aug 17, 2012 at 22:28 UTC ( [id://988111]=note: print w/replies, xml ) Need Help??


in reply to Get random unique lines from file

This script takes the name of the file and the number of sequences you want and outputs that number of randomly selected sequences.

It is simple, fast and should handle any size of input file with minimal memory usage:

#! perl -slw use strict; use List::Util qw[ shuffle ]; $/ = '>'; open FASTA, '<:raw', $ARGV[0] or die $!; my @seqPosns; push @seqPosns, tell( FASTA ) while <FASTA>; @seqPosns = shuffle @seqPosns; for ( @seqPosns[ 0 .. $ARGV[ 1 ] // 10 ] ) { seek FASTA, $_, 0; my $seq = <FASTA>; chomp( $seq ); chop( $seq ); print '>', $seq; } close FASTA; __END__ C:\test>988096.pl C:/dell/test/LCS/bioMan.fasta 2 >af418682 TTCCACAACTTTCCACCAAGCTCTACAAGATCCCAGAGTCAGGGGCCTGTATTTTCC TGGGTCTTTTGGGCTTTGCCGCTCCATTTACACAATGTGGTTATCCTGCATTAATGC ACTTCTTTCCTTCAGTACGAGATCTCCTAGATACCGCCTCAGCTCTATATCGGGAAG TCAAACAATCCAGATTGGGACTTCAACCCCATCAAGGACCACTGGCCACAAGCCAAC >ab033557 CTCCACGACATTCCACCAAGCTCTGCTAGATCCCAGAGTGAGGGGCCTTTACTTTCC TGGGTCTTTTGGGCTTTGCTGCCCCTTTTACACAATGTGGCTATCCTGCCTTAATGC ACTTCTTTCCTTCCATTCGAGATCTTCTCGACACCGCCTCTGCTCTGTATCGGGAGG TCAAACAATCCAGATTGGGACTTCAACCCCAACAAGGATCAATGGCCAGAAGCAAAT >x97850 CTCCACAACTTTCCTCCAAACTCTTCAAGATTCCAGAGTCAGGGCCCTGTACCTTCC TGGGTCTTTTGGGGTTTGCCGCCCCTTTCACGCAATGTGGATATCCTGCTTTAATGC ACTTTTTTCCTTCTATTCGAGATCTCCTCGACACCGCCTCTGCTCTGTATCGGGAGG TCAGAAAATCCAGATTGGGACCTCAACCCGCACAAGGACAACTGGCCGGACGCCAAC

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: Get random unique lines from file
by Marshall (Canon) on Aug 17, 2012 at 23:26 UTC
    I like your solution and ours are similar. You know more about the FASTA format than me and were able to "decode" the OP'S intent better than me. I like it!

    I do admit confusion about this "// 10", I don't understand why that would be necessary? Or what the purpose is?

    for ( @seqPosns[ 0 .. $ARGV[ 1 ] // 10 ] ) { ...blah....}

      It simply means that if I forget/omit the second parameter (the number of output records to produce), I get 10 by default.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://988111]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-18 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found