command line perl reading from STDIN

perlhappy has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have a quick question about the behavior of Perl when reading from the STDIN. Specifically about a particular piece of code.

Here is the code:

cat input_file.fq | perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' > output.txt

The format for the input_file.fq is a FASTQ format file. This is standard for storing biological data.

e.g.

@HWI-EAS283_0004_FC:1:1:1321:1118#0/1
TTGCTCAGCAGGTTCAACTGCAGGTTGCCCAGGACTTTAC
+HWI-EAS283_0004_FC:1:1:1321:1118#0/1
gg/fgag_ffgcfgeffafSKd\\adfRffff]fa[fffaf
@HWI-EAS283_0004_FC:1:1:1399:1117#0/1
CTTGACGATTCCCCGCAGGCTGTTCCCGCGGGCCGCAATG
+HWI-EAS283_0004_FC:1:1:1399:1117#0/1

Every line beginning with '@' is the ID for the next 3 lines. The second line is a collection of letters, typically either ATCG. The line beginning with + is just a repeat for the ID and then the fourth line is the last relevant line for a segment. Then this repeats for a new 4 line segment.

Basically, the above code gets the length of the sequence (ATCG) line for every segment, which is great but I dont understand the behaviour of the $s=<>;<>;<>; part of the code.

Could anyone explain what its doing, and how it knows only to look at the correct line (which will be line number 2, 6, 10, 14, 18 etc)? I've played around with this on different file formats and cant figure it out.

Any advice would be greatly appreciated

Comment on command line perl reading from STDIN Select or Download Code

Replies are listed 'Best First'.
Re: command line perl reading from STDIN by choroba (Cardinal) on Jan 22, 2013 at 17:32 UTC
`<>` is a nicer name for readline. In scalar context, which is the case of your code, it reads one line from the input. If you do not assign the returned value to a variable, the line is skipped. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: command line perl reading from STDIN by perlhappy (Novice) on Jan 22, 2013 at 17:54 UTC
`cat input_file.fq \| perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' > output.txt` Ok, but for the above, if we were to step-by-step describe what is going on. How would it be described? It just that from the above I thought it would assign $s on every line OR skip 2 lines and then assign $s to the third line and repeat.	[reply] [d/l]
Re^3: command line perl reading from STDIN by choroba (Cardinal) on Jan 22, 2013 at 18:02 UTC
The fourth line (i.e. the first line, in fact) is consumed by `-n`. See perlrun - how to execute the Perl interpreter. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^4: command line perl reading from STDIN by perlhappy (Novice) on Jan 22, 2013 at 18:14 UTC
Re^3: command line perl reading from STDIN by perlhappy (Novice) on Jan 22, 2013 at 18:12 UTC
Actually, I think I've just worked it out my head. Took a bit of thinking but this is what i think it is doing first line gets sent, but because <> is unassigned (before $s), it skips; then reads the second line and assigns $s to the line and does whatever; then reads third line but because <> is unassigned, it skips; reads fourth line and the same happens; restarts with 5 line but again because <> is unassigned before the $s it skips; reads 6th line and assigns it to $s and does whatever again this continues until end of file If this is incorrect, let me know. Otherwise I hope this helps anyone else who might look at it. Thanks for your help choroba	[reply]
Re^4: command line perl reading from STDIN by Anonymous Monk on Jan 23, 2013 at 23:32 UTC
Re: command line perl reading from STDIN by talexb (Chancellor) on Jan 22, 2013 at 17:56 UTC
`cat input_file.fq \| \ perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' > output.txt` [download] Just a stylistic note, but this can be restated as the following: `perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' \ <input_file.fq >output.txt` [download] Unless I'm dumping the contents of a file to the console, I don't use `cat` that much .. `head`, `tail` and `less` are handier. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l] [select]
Re^2: command line perl reading from STDIN by perlhappy (Novice) on Jan 31, 2013 at 16:43 UTC
Yeah, typically I do the same, however, in this case although both pieces of code do the same thing there is a massive run time difference. The file that I'm using is 107,259,832 lines long and the other 63 files I have are between 100 million lines and 200 million lines long. When running the original (utilising cat and piping this to perl) command and piping the output to a just 'wc -l' it took about a minute. With the change to your command structure it has taken about 15 mins so far and is still running. I'm not entirely sure why this is the case (probably to do with how perl handles files vs STDIN), but I thought it is something you should be aware of. Especially if anyone is working with files in the order of 100s of millions of lines long, and if this is typically the case.	[reply]
Re^3: command line perl reading from STDIN by choroba (Cardinal) on Jan 31, 2013 at 17:25 UTC
This is strange. I cannot replicate the behaivour with files of millions of lines. Are you sure there are no other factors involved? لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^4: command line perl reading from STDIN by perlhappy (Novice) on Jan 31, 2013 at 18:06 UTC
Re^5: command line perl reading from STDIN by BrowserUk (Patriarch) on Jan 31, 2013 at 18:24 UTC
Some notes below your chosen depth have not been shown here
Re^5: command line perl reading from STDIN (strace) by tye (Sage) on Jan 31, 2013 at 19:11 UTC
Some notes below your chosen depth have not been shown here
Re: command line perl reading from STDIN by AnomalousMonk (Archbishop) on Jan 23, 2013 at 00:48 UTC
Sometimes it's useful to let Perl tell you what it thinks about the code you give it to execute (e.g., "where does the fourth lineread come from?"): `>perl -MO=Deparse -ne "$s=<>;<>;<>;chomp($s);print length($s).\"\n\"; " LINE: while (defined($_ = <ARGV>)) { $s = <ARGV>; <ARGV>; <ARGV>; chomp $s; print length($s) . "\n"; } -e syntax OK` [download] Use `-MO=Deparse,-p` for even gorier details. See B::Deparse and O.	[reply] [d/l] [select]
Re^2: command line perl reading from STDIN by perlhappy (Novice) on Jan 31, 2013 at 16:25 UTC
Thanks. That's really great... Really helps with understanding whats actually going on.	[reply]

Back to Seekers of Perl Wisdom