in reply to Deconvolutinng FastQ files
If this is (as it appears) schoolwork (homework?), then you need to understand that this is NOT 'code-a-matic.'
We'll be pleased to help you learn; you need merely show that you've made a good faith effort to solve your problem. In this case, means, post your code and tell us how it fails or post an algorithm (or pseudo-code) where you can't work out the syntax.
You've outlined a fairly ambitious project for a 'complete newbie in perl,' so -- in case you're stuck on which of Perl's capabilities will help you here, consider
- Do you know how to get the data from the "main file" into a script? If not, perldoc -f while will be helpful. Hint: given the size of your data file, you'll probably want to do so, line by line.
- Consider pushing each line into a temporary cache as it's read; then test to see if it satisfies some criterion for being line 2. If not, read the next line, and see if it's line 2. If so, test its first 9 chars with regexen (lottsa' reading here: perldoc perlretut and company) and if those match the characteristics for replicate 1, 2 or 3, stashthe cached line, the line with the match and the next two lines (Hint: set a flag when you find any match for any target replicate and use it and the ++ [perldoc -f increment increment operator to know when you've pushed all four lines of the record) into the approrpiate array, say, @rep1, @rep2 or @rep3.
- wash, rinse, repeat...
My suspicion is that working out an appropriate set of regular expressions (there's a broad hint in the word "set" and a part of one of many possible solutions next) will be your biggest challenge, so...
my (@rep1,@rep2,@rep3); my $prefix = qr/[ACTG]{3}/; my $rep1 = qr/TTGT/; my $rep2 = qr/GGTT/; my $rep3 = qr/ACCT/; my $postfix = qr/[ACTG]{2}/; while (my $line = <DATA>) { if ($line =~ /^$prefix $rep1 $postfix/x ) { push @rep1, $line; # ignoring, for regex instruction, # the need to push your cached line, etc.. +. } elsif ($line =~ /^$prefix $rep2 $postfix/x ) { ....
There may be a way around this line-by-line approach. If you can absolutely count on "+" as the entire content of the third line of each record, you could use that fact as part of an approach to reading your "main file" record-by-record -- but that would be an additional complexity. Your addendum does, however, suggest an approach.
So, my suggestion is -- try this, if you're working on homework... and come back when you get stuck, with code, and details about the shortcomings of that code
And BTW, welcome to the Monastery.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Deconvolutinng FastQ files
by snakebites (Initiate) on Aug 07, 2012 at 13:40 UTC | |
by ww (Archbishop) on Aug 07, 2012 at 13:50 UTC |