http://www.perlmonks.org?node_id=556901


in reply to creating large xml files

"Obv reasons"? I'm not sure I follow. In a day and age where Java seems all too popular (I currently have one Java app running that is using 1116m virtual and 657m resident), I don't exactly follow why ~80-100m in memory should be a concern.

Especially since the guys who wrote your OS, whether that's Windows, Linux, or BSD (Mac), or pretty much any other modern OS, have already solve the problem of using a hard disk as if it were RAM. So if you really do run out of memory and start swapping, it actually can often be faster than if you try to be sneaky. Usually, the OS will swap out some other process first while yours runs, which means that you'll get to stay all in memory.

I suppose my suggestion is to start with what works, and worry about the optimisations later. You may not really actually need them. Do it all in memory since that's probably way simpler. Optimise it later.

Replies are listed 'Best First'.
Re^2: creating large xml files
by Jaap (Curate) on Jun 22, 2006 at 13:50 UTC
    XML in Perl (using XML::Parser for instance) tends to get blown up significantly when it is stored as a hash of hashes or some other not-so-memory-efficient way.
    My guess is that ftumsh is worried about that.
Re^2: creating large xml files
by exussum0 (Vicar) on Jun 22, 2006 at 14:35 UTC
    "Optimise it later." Preoptimization is evil, but there's no reason not to set a technical requirement to try have /some/ sanity. If his machine has a gig of ran and he needs to run in parallel under load, it's a fine technical requirement. What if he's running on a low-memory device?
      You could be right. He could need to keep memory requirements down for any of the reasons you've suggested. But doesn't the simple fact that we've started playing "guess why he needs to keep the memory requirements down" mean that it's not "for obv reasons"?

      (Don't mind me... Morning came too early today and I'm in a weird mood...)

        No, of course it's obvious. I deduce he's trying to do soap services to yahoo and google, through a TI-85 calculator, running mod_perl, in hopes to get 500 useres per hour through. Kidding.

        I got no more than you do. :)

Re^2: creating large xml files
by Anonymous Monk on Jun 22, 2006 at 17:55 UTC
    Optimise it later

    That's the academic answer, not the practical one. It assumes your time is worth nothing, or that optimization is easy. Neither is true.

    Ever try to get management buy-in for a total re-write of an app that wasn't written with performance designed in from the start? It's painful.

    Performance, like security, needs to be built in from the start. If it isn't, you can pray for the so-called "80%-20% Co-incidence" to save you, or you can re-write it from scratch using tighter algorithms and faster data structures. Total re-writes cost a lot of time and money; partial fixes tend to end up as cheap hacks.

    Unless you have no clue as to what you're writing, just do it right the first time, so you don't have to do it over later. Remember, if your code gets too slow, (and yes, I've seen this happen) it may actually become too slow to properly refactor. If a comparison run takes several days to run, small, incremental changes become very expensive.

    If the app is fast and tight, making it better is cheaper, because the cost of testing is cheaper; and the cost of refactoring is cheaper. For a one off script, this doesn't matter; but for a large scale project, performance is more critical than stuffy academics realize. In business, time is money.

      On the contrary. It's immensely practical. You're assuming your time is worth less than an extra GB of RAM. Assuming that ignoring all the optimisations saves you 4 hours of time in development, and about 50% (another 2 hours) in debugging, and 1GB of RAM is worth $120, you need to be paid $20/hour or less in order to justify wasting time on such an optimisation.

      In actuality, most programs will take much longer than that to write optimised - even from the ground up - especially in areas where you're unsure of the optimisation required. And RAM, CPU speed, and disk speed are all getting cheaper, not more expensive.

      As I've said before, it's not performance that matters, but responsiveness. If you get it responsive without wasting time on unneeded optimisations, why spend time/money on it to get it "faster"? You're right that time is money - you gotta take into account the programmer's time/money, too. I don't know about you, but I haven't made under $20/hour since I left university. It's cheaper to buy the stick of RAM and move on to the next business problem to be solved.

        You can't always just throw hardware at a problem. Who says there's a free slot on the board for another Gig of RAM? If not, you may need a new machine. Depending on where you work, that may not be a simple issue. It's almost certainly not going to impact just you and your time; it's going to take up the time of a whole host of other people.

        If there's a new machine, someone in management has to approve funds for it. That means they have to defend that purchase to the shareholders, which means they have to write up a defense of the proposal. Someone in IT has to research the machine to buy, and the reasons for that particular choice. Someone has to cut a P.O. for the system, and someone in shipping and recieving has to get the machine, and send it to the appropriate location. Someone in accounting has to record the transaction. Some sysadmin then has to set the machine up, and clone the old OS. If the old OS can't address the memory space of the new machine, a new OS may be need to be installed. If so, then some or all of your development tools (including Perl XS modules) may need to be recompiled for the new architecture. Everyone in production support will have to be trained on what's installed on the new machine, and how to maintain it.

        And then you can install your app, compile it, and see if it runs any faster.

        To all do that, you'll need to get buy in from your own management (to approve your purchase), possibly Sr. management/Finance, depending on the costs involved, Production Support (to approve training new staff on the new machinne), the Systems Manager (to approve setting up a new machine in the server room -- may or may not be the guy in charge of Production Support), and you'll chew up some of their time talking to each one. Any one of them may be able to veto your purchase approval. It's now turned political. It's not so simple now, is it?

        Suddenly, you end up embroiled in a maze of office politics, and what was a simple technical matter has become a major social issue; which it tends to be when managment hears the magic buzzword: "hardware". There's something about the word that business types instinctively don't like: it smacks of permanent investment in a quickly depreciating asset, and they don't like that combination. Getting hardware out of managment can be like pulling teeth.

        So, by all means, if you can throw hardware at the problem, and pull it off, go for it. Most of the time, most places, you'll find that you can't, or if you can, it's certainly a lot more involved than just waltzing into the server room, powering down a production box, and slotting in a new stick of RAM, then going on your merry way.

Re^2: creating large xml files
by ftumsh (Scribe) on Jun 27, 2006 at 14:06 UTC
    the os is linux. By obv reasons I mean that I have only a certain amount of ram. To cut down on it's usage for parsing xml I use XML::Twig. To parse a 50meg xml takes > 1gig memory. It's a multi tasking environment so if 100 files land at the same time it won't be long before the machine grinds to a halt. So when writing 50meg of xml I would want to do it a chunk at a time. I could roll my own but then I have to handle all the encoding and what have you. I'll have a look at SAX.