Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: command line perl reading from STDIN

by choroba (Abbot)
on Jan 31, 2013 at 17:25 UTC ( #1016364=note: print w/ replies, xml ) Need Help??


in reply to Re^2: command line perl reading from STDIN
in thread command line perl reading from STDIN

This is strange. I cannot replicate the behaivour with files of millions of lines. Are you sure there are no other factors involved?

لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ


Comment on Re^3: command line perl reading from STDIN
Re^4: command line perl reading from STDIN
by perlhappy (Novice) on Jan 31, 2013 at 18:06 UTC
    >time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l
    26814958

    real 52m27.757s
    user 1m11.780s
    sys 0m29.310s

    >time cat A_1_1.fq | perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' | wc -l
    26814958

    real 0m59.659s
    user 0m36.582s
    sys 0m4.108s

    The files are 26814958 x 4 lines long.

    These are the commands that I used and the time statistics. Clearly a massive difference. Not really sure why. This is perl, v5.10.0 built for x86_64-linux-thread-multi

    If you've any suggestions for me to check out on the system that I'm using let me know. This is the OS: SUSE Enterprise Linux SP2 64bit

      What do you get if you let perl do the open rather than have the shell redirect>:

      time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' A_1_1.fq | wc -l

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        The exact same output using both commands and the exact same number of lines. The results are exactly the same but the time performance is really different. Weird

      That is a dramatic difference and is worth investigating further. Making Perl input processing 60-times faster in some cases might be a result.

      I'd probably run strace on those cases and see what Perl is doing differently.

      - tye        

        Ok, a probable reason for the difference.
        Its not actually as great a huge difference in time anymore. Let me explain, I'm doing work on a server that lots of other people are working on. Because of this the server sometimes gets more jobs to run than other times. I reviewed some of the jobs submission logs yesterday and it was extremely busy yesterday when I was doing the checks. I ran the two again today and used strace to track whats going on.

        cat A_1_1.fq

        write(1, "12555:2368#0/1\nffafWgggWgagcggff"..., 1048576) = 1048576
        read(3, "0004_FC:1:76:5896:2982#0/1\naQXaa"..., 1048576) = 1048576
        write(1, "0004_FC:1:76:5896:2982#0/1\naQXaa"..., 1048576) = 1048576
        read(3, "\nhhhhhcghghhhhhhhhfhhhhhgfghhhhh"..., 1048576) = 1048576
        write(1, "\nhhhhhcghghhhhhhhhfhhhhhgfghhhhh"..., 1048576) = 1048576

        perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";'

        read(0, "ehhhfhhh]\n@HWI-EAS283_0004_FC:1:"..., 4096) = 4096
        read(0, "_0004_FC:1:52:6965:11034#0/1\nggg"..., 4096) = 4096
        read(0, "0004_FC:1:52:10518:11036#0/1\nhhg"..., 4096) = 4096
        read(0, "1:52:14559:11038#0/1\nffffffdfdf["..., 4096) = 4096
        write(1, "GCCCCCAGAGCANCGTCTCTGGGGGCAGCCAG"..., 4096) = 4096
        read(0, "fgggfcbfffcffdcdfcfaffffaa^fff"..., 4096) = 4096



        >time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l

        read(3, "-EAS283_0004_FC:1:21:15451:12331"..., 4096) = 4096
        read(3, ":18706:12324#0/1\ncaYYcaaaVTaaZ"..., 4096) = 4096
        read(3, "hhhfgahhhhh\n@HWI-EAS283_0004_FC:"..., 4096) = 4096
        read(3, "ATCCTCCAGGCGATTCAACGCCTTGGTTCTCT"..., 4096) = 4096
        write(1, "TTTCTGTTCACTCTCAACTTCTCCTTCCAGTT"..., 4096) = 4096
        read(3, "8646:12340#0/1\ngegcgaKaaaffff_gg"..., 4096) = 4096

        There is still a time difference but nowhere near as large as before (see below). The first is however, consistently faster.

        >time cat A_1_1.fq | perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' | wc -l
        26814958

        real 0m41.711s
        user 0m38.406s
        sys 0m5.096s


        >time perl -ne '$s=<>;<>;<>; chomp $s; print "$s\n";' < A_1_1.fq | wc -l
        26814958

        real 3m39.382s
        user 0m52.811s
        sys 0m23.169s


Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1016364]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (10)
As of 2014-10-23 18:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (126 votes), past polls