Don't ask to ask, just ask PerlMonks

### Re: command line perl reading from STDIN

by talexb (Canon)
 on Jan 22, 2013 at 17:56 UTC ( #1014745=note: print w/replies, xml ) Need Help??

in reply to command line perl reading from STDIN

cat input_file.fq | \
perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' > output.txt [download] Just a stylistic note, but this can be restated as the following: perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' \
<input_file.fq >output.txt
[download]

Unless I'm dumping the contents of a file to the console, I don't use cat that much .. head, tail and less are handier.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^2: command line perl reading from STDIN
by perlhappy (Novice) on Jan 31, 2013 at 16:43 UTC
Yeah, typically I do the same, however, in this case although both pieces of code do the same thing there is a massive run time difference.

The file that I'm using is 107,259,832 lines long and the other 63 files I have are between 100 million lines and 200 million lines long. When running the original (utilising cat and piping this to perl) command and piping the output to a just 'wc -l' it took about a minute. With the change to your command structure it has taken about 15 mins so far and is still running.

I'm not entirely sure why this is the case (probably to do with how perl handles files vs STDIN), but I thought it is something you should be aware of. Especially if anyone is working with files in the order of 100s of millions of lines long, and if this is typically the case.
This is strange. I cannot replicate the behaivour with files of millions of lines. Are you sure there are no other factors involved?
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
>time perl -ne '$s=<>;<>;<>; chomp$s; print "$s\n";' < A_1_1.fq | wc -l 26814958 real 52m27.757s user 1m11.780s sys 0m29.310s >time cat A_1_1.fq | perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' | wc -l
26814958

real 0m59.659s
user 0m36.582s
sys 0m4.108s

The files are 26814958 x 4 lines long.

These are the commands that I used and the time statistics. Clearly a massive difference. Not really sure why. This is perl, v5.10.0 built for x86_64-linux-thread-multi

If you've any suggestions for me to check out on the system that I'm using let me know. This is the OS: SUSE Enterprise Linux SP2 64bit

Log In?
 Username: Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014745]
help
Chatterbox?
 erix . o O( "what fools the french are, Jeeves" ) [choroba]: Also some Достое́вс кий [Discipulus]: mmh windows understand something like: cd c:\\\\path\\\\to weird..

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2017-05-24 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My favorite model of computation is ...

Results (183 votes). Check out past polls.