Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: command line perl reading from STDIN

by talexb (Canon)
on Jan 22, 2013 at 17:56 UTC ( #1014745=note: print w/replies, xml ) Need Help??

in reply to command line perl reading from STDIN

    cat input_file.fq | \ perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' > output.txt

Just a stylistic note, but this can be restated as the following:

    perl -ne '$s=<>;<>;<>;chomp($s);print length($s)."\n";' \ <input_file.fq >output.txt

Unless I'm dumping the contents of a file to the console, I don't use cat that much .. head, tail and less are handier.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^2: command line perl reading from STDIN
by perlhappy (Novice) on Jan 31, 2013 at 16:43 UTC
    Yeah, typically I do the same, however, in this case although both pieces of code do the same thing there is a massive run time difference.

    The file that I'm using is 107,259,832 lines long and the other 63 files I have are between 100 million lines and 200 million lines long. When running the original (utilising cat and piping this to perl) command and piping the output to a just 'wc -l' it took about a minute. With the change to your command structure it has taken about 15 mins so far and is still running.

    I'm not entirely sure why this is the case (probably to do with how perl handles files vs STDIN), but I thought it is something you should be aware of. Especially if anyone is working with files in the order of 100s of millions of lines long, and if this is typically the case.
      This is strange. I cannot replicate the behaivour with files of millions of lines. Are you sure there are no other factors involved?
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        >time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l

        real 52m27.757s
        user 1m11.780s
        sys 0m29.310s

        >time cat A_1_1.fq | perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' | wc -l

        real 0m59.659s
        user 0m36.582s
        sys 0m4.108s

        The files are 26814958 x 4 lines long.

        These are the commands that I used and the time statistics. Clearly a massive difference. Not really sure why. This is perl, v5.10.0 built for x86_64-linux-thread-multi

        If you've any suggestions for me to check out on the system that I'm using let me know. This is the OS: SUSE Enterprise Linux SP2 64bit

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014745]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2018-03-25 04:16 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (300 votes). Check out past polls.