Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Deconvolutinng FastQ files

by Kenosis (Priest)
on Aug 06, 2012 at 19:49 UTC ( [id://985812]=note: print w/replies, xml ) Need Help??


in reply to Deconvolutinng FastQ files

Here's an option that uses Tie::File to bind the fastq records' file to an array for processing. The files will automatically close when my %FHs falls out of scope. My thanks to BrowserUk for the elegant file handle/hash routine.

use Modern::Perl; use Tie::File; { tie my @fastqLines, 'Tie::File', 'records.fastq', recsep => "\n" o +r die $!; my %FHs = map { open my $fh, '>', "$_.fastq" or die $!; $_ => $fh } qw[ TTGT GGTT ACCT ]; for ( my $i = 0 ; $i < scalar @fastqLines ; $i += 4 ) { $fastqLines[ $i + 1 ] =~ /^...(.{4})/ and print { $FHs{$1} } @fastqLines[ $i .. $i + 3 ]; } untie @fastqLines; }

Update: Don't try this at home, as the OP's 34G file is much too large for Tie::File to efficiently handle.

Replies are listed 'Best First'.
Re^2: Deconvolutinng FastQ files
by BrowserUk (Patriarch) on Aug 06, 2012 at 19:57 UTC

    I hate to be there bearer of bad tidings here, but if you'd ever tried using Tie::File on a 34GB file, you'd never be suggesting it to others. It would take weeks to complete.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Oh, this is sad. I just had the script work on a 1G+ file, and extrapolated the time to about 17hrs of processing for a 34G file. It's not weeks, but it's also not practical. Didn't realize that Tie:File was so inefficient with large files.

      Good call to point this out, BrowserUK! Will place an "Update:" in the posting.

        :D Thank you for the help. Although, it will probably take me a while to understand what you guys are talking about.
Re^2: Deconvolutinng FastQ files
by BrowserUk (Patriarch) on Aug 06, 2012 at 20:55 UTC

    Just to make the point about the inappropraitness of Tie::File for this.

    1. Using normal IO on a 400MB fastQ file takes 20 seconds:

      And use 1.3 MB of ram; performs 96e3 reads 97e3 writes; and consumes 42 billion clock cycles.

      Giving a projected runtime for the OPs 34GB file of 30.8 minutes.

    2. Using Tie::File on that same 400MB file takes 1340 seconds:

      And uses 601MB of ram; performs 10000e3 reads; 95e3 writes; and consumes 3,182 billion clock cycles.

      Giving a projected runtime for the OPs 34GB file, of 34 hours!

    Tie::File has its uses. This is not one of them.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I'll remember:

      my %mindHash = ( 'Tie:File + Large File' => '"Run Away!" (King Arthur, Monty Py +thon)' );
        my %mindHash = ( 'Tie:File + Large File' => '"Run Away!" (King Arthur, Monty Py +thon)' );

        Oh noes! Kenosis has deleted all of his memory except for the fact that Tie::File and large files are to be avoided.

        $mindHash{'assign to entire hash'}='only if you want to delete all exi +sting data';

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://985812]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-04-24 00:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found