Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

by nan (Novice)
on May 17, 2005 at 14:21 UTC ( [id://457826]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000
in thread Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

Hi tlm,

Thank you so much for your informative and meaningful advice. So far your codes run very well, but if you don't mind, I'd like to ask you some questions about your codes:

1) You set twig_handlers to two key elements and called corresponding subroutines. What I couldn't understand is, as you didn't assign parameters in &topic and &extpage, what's the meaning of:
my ( $twig, $topic ) = @_;

2) About %links, the hash table you created:
Does the following codes mean you add each children link for att('re:resource')?
$links{ $_->att('r:resource') } = $_ for $topic->children('link');

3) About your two sub routines, I understand that at first, you walk through the whole xml document to find out the fist <topic/> node and if it has <link/> child, you save link information to hash table and then examine <ExternalPage/> followed it. If it doesn't have a <link/> child, you reset the hash to empty, here is my question, if the hash is empty, will you examine <ExternalPage/> as well coz I'm really doesn't know how these two subroutine connect with each other or what is the run order of them? Is it run one <Topic/> then all <ExternalPage/> or all <Topic/> first then All <ExternalPage/>?

Thanks again for your time!
  • Comment on Re^4: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

Replies are listed 'Best First'.
Re^5: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000
by tlm (Prior) on May 18, 2005 at 03:09 UTC

    What you're missing is a grasp of "event-driven" programming. It's a distinct style of programming, just as OOP is (though they are not mutually exclusive; XML::Twig is both OO and event-driven). An event-driven parser is a good example of this programming model. (It is also the norm in GUI programming.) Such a parser has a core functionality (namely, parsing text according to some syntax), but the programmer can customize it by "registering" subroutines with the parser, to be associated with specific parsing events (e.g. finding a closing tag). The parser will then invoke these pre-registered subroutines, with a pre-specified set of arguments, at the appropriate times during the parsing. These subroutines one "registers" with the parser are called "callbacks" or "handlers"1.

    The subs topic and extpage are two such handlers. They get invoked by the parser whenever it finishes parsing a Topic or ExternalPage section. They each receive two arguments from the parser: the XML::Twig object and the XML element that the parser just finished parsing. (This answers your first question.)

    These two subroutines run separately from each other; in other words, neither of them calls the other one. This rules out direct communication between the two subs. One way around this is for them to communicate through shared variables (i.e. %links). In this case indirect communication is necessary since extpage cannot backtrack over the XML to see what links, if any, were found by topic. In the code I wrote only the keys of %links are used; saving the actual link objects as the values corresponding to these keys is just there for some potential future use. The code would work just as well if those values were all 1, say.

    Note that these two subroutines run multiple times during the parsing operation. This is a key point. It is not the case that all the calls to topic happen first, and then all the calls to extpage. The multiple calls to these methods alternate.

    ...coz I'm really doesn't know how these two subroutine connect with each other or what is the run order of them?

    The parser takes care of invoking the subroutines at the right time during the parsing; in this case, they get invoked once the parser finishes parsing a Topic or ExternalPage section, respectively. This all happens as the result of the call to $twig->parsefile( './sample.xml'); it is this call that sets off the whole sequence of events that ultimately cause the handlers to be invoked by the parser.

    1Sometimes they are also called "hooks", although I have also seen the term "hook" used to refer to the places in the source code for the parser (or whatever) where the callbacks are invoked. You can think of these "hooks" as places provided by the author of the parser where the programmer using the parser can "hang" custom code from.

    Update: The first chapter of HOP has a nice discussion of callbacks.

    the lowliest monk

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://457826]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-13 23:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found