|Problems? Is your data what you think it is?|
I have a quick question about the behavior of Perl when reading from the STDIN. Specifically about a particular piece of code.
Here is the code:
cat input_file.fq | perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' > output.txt
The format for the input_file.fq is a FASTQ format file. This is standard for storing biological data.
Every line beginning with '@' is the ID for the next 3 lines. The second line is a collection of letters, typically either ATCG. The line beginning with + is just a repeat for the ID and then the fourth line is the last relevant line for a segment. Then this repeats for a new 4 line segment.
Basically, the above code gets the length of the sequence (ATCG) line for every segment, which is great but I dont understand the behaviour of the $s=<>;<>;<>; part of the code.
Could anyone explain what its doing, and how it knows only to look at the correct line (which will be line number 2, 6, 10, 14, 18 etc)? I've played around with this on different file formats and cant figure it out.Any advice would be greatly appreciated