Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: command line perl reading from STDIN

by perlhappy (Novice)
on Jan 31, 2013 at 18:06 UTC ( [id://1016370]=note: print w/replies, xml ) Need Help??


in reply to Re^3: command line perl reading from STDIN
in thread command line perl reading from STDIN

>time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l
26814958

real 52m27.757s
user 1m11.780s
sys 0m29.310s

>time cat A_1_1.fq | perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' | wc -l
26814958

real 0m59.659s
user 0m36.582s
sys 0m4.108s

The files are 26814958 x 4 lines long.

These are the commands that I used and the time statistics. Clearly a massive difference. Not really sure why. This is perl, v5.10.0 built for x86_64-linux-thread-multi

If you've any suggestions for me to check out on the system that I'm using let me know. This is the OS: SUSE Enterprise Linux SP2 64bit

Replies are listed 'Best First'.
Re^5: command line perl reading from STDIN
by BrowserUk (Patriarch) on Jan 31, 2013 at 18:24 UTC

    What do you get if you let perl do the open rather than have the shell redirect>:

    time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' A_1_1.fq | wc -l

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The exact same output using both commands and the exact same number of lines. The results are exactly the same but the time performance is really different. Weird
        but the time performance is really different.

        Could I trouble you to post that timing data?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^5: command line perl reading from STDIN (strace)
by tye (Sage) on Jan 31, 2013 at 19:11 UTC

    That is a dramatic difference and is worth investigating further. Making Perl input processing 60-times faster in some cases might be a result.

    I'd probably run strace on those cases and see what Perl is doing differently.

    - tye        

      Ok, a probable reason for the difference.
      Its not actually as great a huge difference in time anymore. Let me explain, I'm doing work on a server that lots of other people are working on. Because of this the server sometimes gets more jobs to run than other times. I reviewed some of the jobs submission logs yesterday and it was extremely busy yesterday when I was doing the checks. I ran the two again today and used strace to track whats going on.

      cat A_1_1.fq

      write(1, "12555:2368#0/1\nffafWgggWgagcggff"..., 1048576) = 1048576
      read(3, "0004_FC:1:76:5896:2982#0/1\naQXaa"..., 1048576) = 1048576
      write(1, "0004_FC:1:76:5896:2982#0/1\naQXaa"..., 1048576) = 1048576
      read(3, "\nhhhhhcghghhhhhhhhfhhhhhgfghhhhh"..., 1048576) = 1048576
      write(1, "\nhhhhhcghghhhhhhhhfhhhhhgfghhhhh"..., 1048576) = 1048576

      perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";'

      read(0, "ehhhfhhh]\n@HWI-EAS283_0004_FC:1:"..., 4096) = 4096
      read(0, "_0004_FC:1:52:6965:11034#0/1\nggg"..., 4096) = 4096
      read(0, "0004_FC:1:52:10518:11036#0/1\nhhg"..., 4096) = 4096
      read(0, "1:52:14559:11038#0/1\nffffffdfdf["..., 4096) = 4096
      write(1, "GCCCCCAGAGCANCGTCTCTGGGGGCAGCCAG"..., 4096) = 4096
      read(0, "fgggfcbfffcffdcdfcfaffffaa^fff"..., 4096) = 4096



      >time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l

      read(3, "-EAS283_0004_FC:1:21:15451:12331"..., 4096) = 4096
      read(3, ":18706:12324#0/1\ncaYYcaaaVTaaZ"..., 4096) = 4096
      read(3, "hhhfgahhhhh\n@HWI-EAS283_0004_FC:"..., 4096) = 4096
      read(3, "ATCCTCCAGGCGATTCAACGCCTTGGTTCTCT"..., 4096) = 4096
      write(1, "TTTCTGTTCACTCTCAACTTCTCCTTCCAGTT"..., 4096) = 4096
      read(3, "8646:12340#0/1\ngegcgaKaaaffff_gg"..., 4096) = 4096

      There is still a time difference but nowhere near as large as before (see below). The first is however, consistently faster.

      >time cat A_1_1.fq | perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' | wc -l
      26814958

      real 0m41.711s
      user 0m38.406s
      sys 0m5.096s


      >time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l
      26814958

      real 3m39.382s
      user 0m52.811s
      sys 0m23.169s


Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1016370]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-19 07:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found