http://www.perlmonks.org?node_id=985624


in reply to Deconvolutinng FastQ files

If production is your goal, frozenwithjoy's suggestion may be just what you need.

If this is (as it appears) schoolwork (homework?), then you need to understand that this is NOT 'code-a-matic.'

We'll be pleased to help you learn; you need merely show that you've made a good faith effort to solve your problem. In this case, means, post your code and tell us how it fails or post an algorithm (or pseudo-code) where you can't work out the syntax.

You've outlined a fairly ambitious project for a 'complete newbie in perl,' so -- in case you're stuck on which of Perl's capabilities will help you here, consider

My suspicion is that working out an appropriate set of regular expressions (there's a broad hint in the word "set" and a part of one of many possible solutions next) will be your biggest challenge, so...

my (@rep1,@rep2,@rep3); my $prefix = qr/[ACTG]{3}/; my $rep1 = qr/TTGT/; my $rep2 = qr/GGTT/; my $rep3 = qr/ACCT/; my $postfix = qr/[ACTG]{2}/; while (my $line = <DATA>) { if ($line =~ /^$prefix $rep1 $postfix/x ) { push @rep1, $line; # ignoring, for regex instruction, # the need to push your cached line, etc.. +. } elsif ($line =~ /^$prefix $rep2 $postfix/x ) { ....

There may be a way around this line-by-line approach. If you can absolutely count on "+" as the entire content of the third line of each record, you could use that fact as part of an approach to reading your "main file" record-by-record -- but that would be an additional complexity. Your addendum does, however, suggest an approach.

So, my suggestion is -- try this, if you're working on homework... and come back when you get stuck, with code, and details about the shortcomings of that code

And BTW, welcome to the Monastery.

Replies are listed 'Best First'.
Re^2: Deconvolutinng FastQ files
by snakebites (Initiate) on Aug 07, 2012 at 13:40 UTC
    I was hoping to get an idea where I should focus my reading about perl, but obviously I am not expecting a code-o-matic solution. It's not quite a homework. I am more interested in the biological question rather than the coding part which I know I'm not very good at.
      That's fine... and good for you. ++ The hope that that was your aim was my reason for including the code snippet and the links to a few relevant docs.