Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

How to write a perl script to grab the first 100 sequences to a 227,000 sequence file?

by radnorr (Initiate)
on Jul 26, 2012 at 21:04 UTC ( #983938=perlquestion: print w/ replies, xml ) Need Help??
radnorr has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am trying to create a script that will allow me to grab a small sample (~100) of sequences in a fasta file that contains over 200,000 sequences and put it into an output file. I have tried things like split and sed but they are to inefficient for such a process and produce unnecessary output files. Any pointers will help. Thank you.

Comment on How to write a perl script to grab the first 100 sequences to a 227,000 sequence file?
Re: How to write a perl script to grab the first 100 sequences to a 227,000 sequence file?
by Anonymous Monk on Jul 26, 2012 at 21:39 UTC
Re: How to write a perl script to grab the first 100 sequences to a 227,000 sequence file?
by BrowserUk (Pope) on Jul 26, 2012 at 21:39 UTC

    change the 99 to 1 less than the number of sequences you want:

    perl -e"BEGIN{$/=qq[\n>]}" -ple"$. == 99 && last;" yourfile.fasta >new +file.fasta

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      That's nice, but all > except the first one are missing...
        I meant that all sequences are starting with a > inside a fasta file. The code above works perfectly, but somehow it doesn't put the > back, except the first one...
Re: How to write a perl script to grab the first 100 sequences to a 227,000 sequence file?
by pvaldes (Chaplain) on Jul 26, 2012 at 21:44 UTC

    to grab the first 200 lines (each sequence having at least two lines) its a simple "head -n 200 myfile" problem or, if you want to be more perlish, while $my_file print if $. <= 200, but I suggest you to take a look to bioperl in any case

Re: How to write a perl script to grab the first 100 sequences to a 227,000 sequence file?
by rnaeye (Pilgrim) on Jul 29, 2012 at 03:31 UTC
    Here is one liner:
    perl -ne 'print if $. <= 100' inputfile

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://983938]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-12-28 18:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls