Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Do you know where your variables are?
 
PerlMonks  

Re^2: command line perl reading from STDIN

by perlhappy (Novice)
on Jan 31, 2013 at 16:43 UTC ( #1016357=note: print w/ replies, xml ) Need Help??


in reply to Re: command line perl reading from STDIN
in thread command line perl reading from STDIN

Yeah, typically I do the same, however, in this case although both pieces of code do the same thing there is a massive run time difference.

The file that I'm using is 107,259,832 lines long and the other 63 files I have are between 100 million lines and 200 million lines long. When running the original (utilising cat and piping this to perl) command and piping the output to a just 'wc -l' it took about a minute. With the change to your command structure it has taken about 15 mins so far and is still running.

I'm not entirely sure why this is the case (probably to do with how perl handles files vs STDIN), but I thought it is something you should be aware of. Especially if anyone is working with files in the order of 100s of millions of lines long, and if this is typically the case.


Comment on Re^2: command line perl reading from STDIN
Re^3: command line perl reading from STDIN
by choroba (Abbot) on Jan 31, 2013 at 17:25 UTC
    This is strange. I cannot replicate the behaivour with files of millions of lines. Are you sure there are no other factors involved?
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      >time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l
      26814958

      real 52m27.757s
      user 1m11.780s
      sys 0m29.310s

      >time cat A_1_1.fq | perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' | wc -l
      26814958

      real 0m59.659s
      user 0m36.582s
      sys 0m4.108s

      The files are 26814958 x 4 lines long.

      These are the commands that I used and the time statistics. Clearly a massive difference. Not really sure why. This is perl, v5.10.0 built for x86_64-linux-thread-multi

      If you've any suggestions for me to check out on the system that I'm using let me know. This is the OS: SUSE Enterprise Linux SP2 64bit

        What do you get if you let perl do the open rather than have the shell redirect>:

        time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' A_1_1.fq | wc -l

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        That is a dramatic difference and is worth investigating further. Making Perl input processing 60-times faster in some cases might be a result.

        I'd probably run strace on those cases and see what Perl is doing differently.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1016357]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2014-04-19 01:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (475 votes), past polls