Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: How to Parse Huge XML Files ?

by jsegal (Friar)
on May 31, 2006 at 17:46 UTC ( [id://552880]=note: print w/replies, xml ) Need Help??


in reply to How to Parse Huge XML Files ?

Without seeing your code, it is impossible to know precisely what is going on, but don't forget that it also depends on what else you are doing/how you are processing the file. For example, if you are builing an in-memory data structure based on the file contents, you could cause yourself to run out of memory even when processing the file as SAX events!

Are you processing the file/events sequentially, or building up some other structure in memory? Obviously, with a large file, you are better off if you keep only a small amount of "processing data" in memory, too.

All the best,

--JAS

Replies are listed 'Best First'.
Re^2: How to Parse Huge XML Files ?
by Marsel (Sexton) on Jun 01, 2006 at 04:21 UTC
    Thanks for all these answers. here is my code And you were right ! I forgot to undef the hash structure that holds data !! But it still doesn't work When i launch it it fulls my mem & swap (the sum is 6Go).
    For example, it even doesn't print the first method message "Here we go ..............", which is printed in response to start_document event.
    The main code is here : Thanks for advices, i'll have a look at XML::Twig also. your sincerily Julien

    Edited by planetscape - added readmore tags

      Hmm. If your initial status message isn't getting printed out, I'd double check that you are running what you think you are running. (I find the debugger invaluable in cases like this -- I happen to like running it from within (x)emacs). Sometimes a module doesn't do what you think it is going to do, and sometime you aren't even running the code you think you are running!

      I know I've been burned by editing a file in one directory, but actually running a version in another directory -- when putting in debugging print statements, I've learned to vary what I output, so I instantly have a positive control that I am running the version of the file I should be -- if the output is "foo" but I just added "baz", I instantly know something is amiss, and don't try to debug the wrong thing...

      All that being said, this may not be your problem, but it might give you some clues as to what is going on....

      Good luck,


      --JAS

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://552880]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-25 06:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found