Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re^4: selecting columns from a tab-separated-values file

by ibm1620 (Scribe)
on Jan 23, 2013 at 20:25 UTC ( #1015017=note: print w/replies, xml ) Need Help??

in reply to Re^3: selecting columns from a tab-separated-values file
in thread selecting columns from a tab-separated-values file

I'm back at work and have tested the two-process solution. It took 60 seconds to pass 10M (M=million) records. Then I pulled the logic for splitting and joining the records out of obuf and into ibuf (thus eliminating obuf) and ran the same test, and it ran in 62 seconds. (In both cases the output was to /dev/null.)<\p>

I reran the tests sending output to an actual file in the same directory as the input, and obtained exactly the same runtimes.

In ALL cases I observed the CPU of the process that was doing the split/join to peg at 100%.

So I have to conclude that disk I/O is negligible for this program, in my environment.
  • Comment on Re^4: selecting columns from a tab-separated-values file

Replies are listed 'Best First'.
Re^5: selecting columns from a tab-separated-values file
by BrowserUk (Pope) on Jan 23, 2013 at 23:09 UTC
    It took 60 seconds to pass 10M (M=million) records.

    Hm. 10e6 in a minute suggests a total time for 1e9 of well under 2 hours.

    This post mentions a time of 5 hours. What changed?

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The number of fields being extracted, mainly. My example of three fields was just a simplified illustration of my question.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1015017]
[Corion]: ... foo "\0\0\0\n" bar\n by matching the first \n instead of matching the four-bytes-in- double-quotes part
[Corion]: ... and I still don't understand why ;)
[Corion]: This is because Filter::Simple does some string-trickery, replacing all string literals with "quoted packed numbers"
[Corion]: I fear this might be a bug in the RE engine, but if it is a bug, even fixing won't help me because I need Filter::Simple for Filter::signatures , which provides signatures as a backwards compatibility feature for Perl <5.22 - and these ...
[Corion]: ... won't get a fix anyway ;) My plan B is to encode the string placeholders avoiding \r and \n

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (12)
As of 2017-01-23 08:06 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (191 votes). Check out past polls.