|laziness, impatience, and hubris|
Deconvolutinng FastQ filesby snakebites (Initiate)
|on Aug 06, 2012 at 06:35 UTC||Need Help??|
snakebites has asked for the
wisdom of the Perl Monks concerning the following question:
I have a series of records (4 lines per record) in FastQ format from DNA sequencing projects. The data actually come for 3 different replicates that are "barcoded". The barcode is basically the first 9 letters in the second line of the record. In the example below:
There are 4 records and the first 9 letters for each record are the following:
Thus, the general form of the barcode is the following:
NNNXXXXNN , where N can be either of the following letters: A, C, T or G and XXXX is either TTGT, GGTT or ACCT (one for each library). As you can see record 1 and record 4 belong to the same replicate.
I would like to separae the records that I have (currently in a single file) into 3 separate files based on the either TTGT, GGTT or ACCT. Currently, the main file is about ~34GB in size.
I wonder if anyone can make a suggestion about how I can go about doing this using perl and bioperl module. I am a complete newbie in perl programming.
Thank you very much for your time and for reading my post.