Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Performance Question

by talexb (Chancellor)
on May 08, 2002 at 13:49 UTC ( [id://165029]=note: print w/replies, xml ) Need Help??


in reply to Performance Question

That's a tough question to answer without knowing a few more variables.

  • Is it OK if you run the machine to the rails -- or do you have to share processing HP with other users (human or nobodies)?
  • Is this a one off -- or are you going to have to this weekly/monthly?

Jumping ahead to a solution, I would probably slice the monster file into pieces (lots of ways to do that) then process a couple of pieces in paralell. The way I would test that would be to take a 1G slice of the file and pretend that's the big file, and try various different piece counts.

Failing that, write a program in C (something I've done many times) to suck the file in, 64K chunks at a time (or whatever size chunks your system can manage), then process the lines individually. The processed lines go into a 64K buffer, and when it gets full, you write it to the output file. Piece of cake. :) And you should get great performance doing it in C, better than Perl.

--t. alex

"Nyahhh (munch, munch) What's up, Doc?" --Bugs Bunny

Replies are listed 'Best First'.
Re: Re: Performance Question
by BUU (Prior) on May 08, 2002 at 13:57 UTC
    Would you really get a sizable performance increase by using c instead of perl to manipulate/print text? (honestly wondering)
      Depends on how good a C programmer you are. If you're reasonably good, yes. Probably a factor of two to four if the transforms are simple. More, possibly, depending on the IO subsystem. (It doesn't matter if your C program could run 50 times faster than the perl one if you've already maxed out your IO channel going twice as fast. You'll just twiddle your thumbs more)

      On the other hand it may take 5-10 times as long to write and debug the program, and maintenance/debugging it'll be a major pain relative to perl.

      A valid question. My guess is yes, but that's based on tuning the custom C program based on what system it runs on. It also depends if this is a one-time job or a weekly/monthly thing, as my initial post said. For a one-time thing, definitely go Perl. For a weekly job, it's worth the investment to write a really well-tuned, optimized C program.

      --t. alex

      "Nyahhh (munch, munch) What's up, Doc?" --Bugs Bunny

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://165029]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found