Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Deconvolutinng FastQ files

by BrowserUk (Patriarch)
on Aug 07, 2012 at 14:24 UTC ( [id://986003]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Deconvolutinng FastQ files
in thread Deconvolutinng FastQ files

Line 120 of your data file contains a record where the key field (characters 4 through 7) are not one of "TTGT" "GGTT" "ACCT"

To handle that, try this modified version:

#! perl -sw use strict; my %outFHs = map { open my $fh, '>', "$_.fastQ" or die $!; $_ => $fh; } qw[ TTGT GGTT ACCT other ]; until( eof() ) { my @lines = map scalar <>, 1 .. 4; my $barcode = substr $lines[1], 0, 9; my $tag = substr $barcode, 3, 4; print { $outFHs{ $tag } // $outFGs{ other } } @lines; } __END__ usage: thisScript theBigfile.fastQ ## outputs to TTGT.fastQ GGTT.fastQ ACCT.fastQ ## Unrecognised records are put into "other.fastQ"

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^4: Deconvolutinng FastQ files
by snakebites (Initiate) on Aug 07, 2012 at 14:49 UTC
    I see. That makes sense since the machine that spits the results makes about 4% errors reading the first few letters just by manually inspecting my files, so sometimes TTGT might come out as TGGT or GGTT might come out as GGTA which are neither of the three barcodes I am after. I guess with the new script you posted I need to specify 'other' with all possible combination of the four letters that might be in my fastq to make sure that the script doesn't stall. Thank you again browseruk.
      so sometimes TTGT might come out as TGGT or GGTT might come out as GGTA which are neither of the three barcodes I am after.

      If you wanted to add those other barcodes to the list, they'd each get their own file.

      I guess with the new script you posted I need to specify 'other' with all possible combination of the four letters that might be in my fastq to make sure that the script doesn't stall.

      If you are happy for all the "others" to go inti a single file called 'other.fastQ', use the new version as is.

      Come to that, if you wish to simply ignore them, use this version:

      #! perl -sw use strict; my %outFHs = map { open my $fh, '>', "$_.fastQ" or die $!; $_ => $fh; } qw[ TTGT GGTT ACCT ]; until( eof() ) { my @lines = map scalar <>, 1 .. 4; my $barcode = substr $lines[1], 0, 9; my $tag = substr $barcode, 3, 4; next unless exists $outFHs{ $tag }; print { $outFHs{ $tag } } @lines; } __END__ usage: thisScript theBigfile.fastQ ## outputs to TTGT.fastQ GGTT.fastQ ACCT.fastQ ## Unrecognised records are ignored

      If there are a high proportion of other records, that could speed things up substantially.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re^4: Deconvolutinng FastQ files
by snakebites (Initiate) on Aug 08, 2012 at 02:45 UTC
    I have just realised there was a simple typo in the code ver2.0, but ver 2 and ver 3 worked really well :O. You are an amazing perlmonk! Browseruk, we will be happy to acknowledge your contribution with the script in future publications for this work. I will PM you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://986003]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-16 14:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found