Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Deconvolutinng FastQ files

by BrowserUk (Pope)
on Aug 06, 2012 at 08:25 UTC ( #985630=note: print w/replies, xml ) Need Help??


in reply to Deconvolutinng FastQ files

Try this. It should take around 40 minutes for 34GB:

#! perl -sw use strict; my %outFHs = map { open my $fh, '>', "$_.fastQ" or die $!; $_ => $fh; } qw[ TTGT GGTT ACCT ]; until( eof() ) { my @lines = map scalar <>, 1 .. 4; my $barcode = substr $lines[1], 0, 9; my $tag = substr $barcode, 3, 4; print { $outFHs{ $tag } } @lines; } __END__ usage: thisScript theBigfile.fastQ ## outputs to TTGT.fastQ GGTT.fastQ ACCT.fastQ

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: Deconvolutinng FastQ files
by snakebites (Initiate) on Aug 07, 2012 at 14:02 UTC
    Thank you browseruk. Wow.. I didn't realise you could sort out the whole thing using less than 2 dozen lines of perl script :O. I am so glad I asked. However, when testing the script out with my fastq files I got three files with a few records in each, but then the script stopped and I got the following error message:

    Use of uninitialized value within %outFHs in ref-to-glob cast at Sort.pl line 14, <> line 120.

    Can't use string ("") as a symbol ref while "strict refs" in use at Sort.pl line 14, <> line 120.

    How can I debug this? I guess this has to do with my records? Thank you.

      Line 120 of your data file contains a record where the key field (characters 4 through 7) are not one of "TTGT" "GGTT" "ACCT"

      To handle that, try this modified version:

      #! perl -sw use strict; my %outFHs = map { open my $fh, '>', "$_.fastQ" or die $!; $_ => $fh; } qw[ TTGT GGTT ACCT other ]; until( eof() ) { my @lines = map scalar <>, 1 .. 4; my $barcode = substr $lines[1], 0, 9; my $tag = substr $barcode, 3, 4; print { $outFHs{ $tag } // $outFGs{ other } } @lines; } __END__ usage: thisScript theBigfile.fastQ ## outputs to TTGT.fastQ GGTT.fastQ ACCT.fastQ ## Unrecognised records are put into "other.fastQ"

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        I see. That makes sense since the machine that spits the results makes about 4% errors reading the first few letters just by manually inspecting my files, so sometimes TTGT might come out as TGGT or GGTT might come out as GGTA which are neither of the three barcodes I am after. I guess with the new script you posted I need to specify 'other' with all possible combination of the four letters that might be in my fastq to make sure that the script doesn't stall. Thank you again browseruk.
        I have just realised there was a simple typo in the code ver2.0, but ver 2 and ver 3 worked really well :O. You are an amazing perlmonk! Browseruk, we will be happy to acknowledge your contribution with the script in future publications for this work. I will PM you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://985630]
help
Chatterbox?
Discipulus manual work: i just tell the same to my boss: every time the quick solution is to assign some manual data entry task to my group.. because we have not direct access to many databases here..
[LanX]: point is: in high speed trade each bank has to remember what he has to get from the others... so dresdner got billed for losses but couldn't claim gains
Discipulus is this the IT?
[Corion]: Discipulus: Well, in many cases it doesn't make sense to build an interface and complicated program just to enter 20 rows into a database ;) But yes, automating data imports should pay off in the long run
[LanX]: Choroba: this happened before I joined, was still in uni, but my boss was summoned to the CEO of the second biggest German bank at that time and could only say " I told them its not ready" ;)
[LanX]: memories....I missed my connection while chatting
[Discipulus]: in this case Corion we are speaking about software licensing: evry year or two we must rescan the whole ced to produce an excel report, while at every activation / disactivation we update a black box DB: i said that i a week i can produce the perl to..
[Discipulus]: rend out the xls IF i have access to the DB

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2017-03-29 12:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (350 votes). Check out past polls.