http://www.perlmonks.org?node_id=637871

bcrowell2 has asked for the wisdom of the Perl Monks concerning the following question:

O Monks -

I have a large, existing perl CGI application that I designed so that every time it runs, it has to parse an XML file. Years later, you guessed it: the XML file is getting pretty big, and the overhead for parsing is getting to be annoying. I'm thinking that I might be able to boost performance by using a binary representation of my XML file. (Well, I could also rewrite the app from scratch, but I don't want to do that :-) There seem to be quite a few methods out there for representing XML in binary, of which Fast Infoset seems to be one of the most standard. Unfortunately, I can only seem to find java and C/C++ implementations, and only the java one seems to be open-source. Does anyone know of a perl implementation of Fast Infoset, or some similar binary representation of XML?

TIA!

Replies are listed 'Best First'.
Re: perl implementation of Fast Infoset, or other binary XML representations?
by misc (Friar) on Sep 09, 2007 at 00:13 UTC
Re: perl implementation of Fast Infoset, or other binary XML representations?
by throop (Chaplain) on Sep 09, 2007 at 04:20 UTC
    Oddly, I was staring at roughly the same problem yesterday.

    XML::Simple and some of the other XML programs convert the XML to a HoH. How much time would you save by reading the XML once, saving it to a file as a HoH with Data::Dumper, and just reading it back in that form?

    throop

      Thanks, misc and throop -- that's very helpful. I think the Data::Dumper technique might do the job for me.
        One caveat about XML::Simple: it doesn't pretend to preserve sub-element order, so if your XML is order-sensitive, you might want to check out other XML modules (or preferably make your code not care).

        It can also take a little time to figure out its ideas of attributes vs. sub-elements if you're taking in XML and producing another XML file as output you're expecting to look the same. Your specific situation doesn't sound like this will give you any issues.

        In all, XML::Simple is named well. It's good for quick work with XML data when the requirements are simple.

        For an anecdotal case, I wrote a configuration manager for an existing project in another language (ActionScript). That project initially cared about order of elements, which I didn't realize until I had written most of the config manager. I considered dumping XML::Simple and rewriting my project to use something else. Since I had the source to the other project and a license to modify it, I decided to check into why the order mattered. It turned out that a simple 3-line change to the existing project made it work with the sub-elements in question in any order. That was much easier than rewriting the entire configuration manager, but now I've forked the original project.

        I submitted the change upstream, but I'm not sure the developers have done anything with the submission. It's closed source (my employers at the time paid for a source license for that version), so I'm not sure I'll ever know.

        If I had bothered to figure out XML::Simple was going to require this workaround, I might have written the configuration manager using another module. Still, though, a three-line change isn't difficult to patch into newer versions of the project it configures, so I'd probably still have rather "fixed" the main project rather than worked around what I consider to be its brokenness.