Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

XML::XPath memory usage

by Jaap (Curate)
on Dec 31, 2002 at 11:45 UTC ( #223295=perlquestion: print w/replies, xml ) Need Help??

Jaap has asked for the wisdom of the Perl Monks concerning the following question:

Wise Monks,

Using XML::Xpath on a 2 MB XML file generates a process that uses 90 MB of memory.

This is due to the fact that XML::XPath is built on top of XML::Parser, which builds a big nested hash structure of the entire XML file.

Would it be a good idea to build an XPath module that works directly on the XML?

It could first build an index of 1st level nodes, 2nd level nodes etc. Would this be a good path to follow?

Replies are listed 'Best First'.
Re: XML::XPath memory usage
by davorg (Chancellor) on Dec 31, 2002 at 11:57 UTC

    All XML parsers work in one of two ways. They are either tree based or stream based. A tree-based parser will always read in all of your document and will therefore have a large memory footprint for a decent sized XML document[1]. Stream-based parsers look a the document one token at a time and therefore have far smaller memory requirements.

    XML::Parser can be used in both modes. From what you're saying, it seems that XML::XPath uses XML::Parser in tree mode. I'm having difficulty thinking how you could build an XPath processor using a stream-based parser. It's probably possible - but I think it would be very hard work. I don't know of any that currently exist.

    If you don't like XML::XPath's memory footprint, have you thought about switching to an alternative (i.e. stream-based) approach?

    [1] Let me pre-empt mirod's reply and point out that XML::Twig gives you the ability to build smaller trees from part of your XML document.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      If you don't like XML::XPath's memory footprint, have you thought about switching to an alternative (i.e. stream-based) approach?

      I have. But none currently exist (as you say too) so i thought about maybe starting to write one myself (ahum).

      I could build it on top of a SAX or Twig module as you point out. I wonder which would be best.
Re: XML::XPath memory usage
by gjb (Vicar) on Dec 31, 2002 at 13:13 UTC

    I think you're going to pay quite a high price performance wise. The huge memory consumption at least gives you speed. It's the usual speed/memory trade-off you're facing here and I'm not sure about what performance you'd be left with using a streaming approach.

    Just my 2 cents, -gjb-

    Update: of course this depends on the amount of queries you want to do against an XML file. Few queries would lend themselves better to a streaming approach, many queries would benefit from a tree approach.

Re: XML::XPath memory usage
by Matts (Deacon) on Jan 01, 2003 at 10:40 UTC
    For streaming XPath you want XML::Filter::Dispatcher. It's SAX based, and supports a fairly large subset of XPath (can't support it all as that's impossible with a streamed approach). It does what you want.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://223295]
Approved by jlk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2020-06-03 22:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (29 votes). Check out past polls.

    Notices?