http://www.perlmonks.org?node_id=458057


in reply to Re^4: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000
in thread Memory errors while processing 2GB XML file with XML:Twig on Windows 2000

What you're missing is a grasp of "event-driven" programming. It's a distinct style of programming, just as OOP is (though they are not mutually exclusive; XML::Twig is both OO and event-driven). An event-driven parser is a good example of this programming model. (It is also the norm in GUI programming.) Such a parser has a core functionality (namely, parsing text according to some syntax), but the programmer can customize it by "registering" subroutines with the parser, to be associated with specific parsing events (e.g. finding a closing tag). The parser will then invoke these pre-registered subroutines, with a pre-specified set of arguments, at the appropriate times during the parsing. These subroutines one "registers" with the parser are called "callbacks" or "handlers"1.

The subs topic and extpage are two such handlers. They get invoked by the parser whenever it finishes parsing a Topic or ExternalPage section. They each receive two arguments from the parser: the XML::Twig object and the XML element that the parser just finished parsing. (This answers your first question.)

These two subroutines run separately from each other; in other words, neither of them calls the other one. This rules out direct communication between the two subs. One way around this is for them to communicate through shared variables (i.e. %links). In this case indirect communication is necessary since extpage cannot backtrack over the XML to see what links, if any, were found by topic. In the code I wrote only the keys of %links are used; saving the actual link objects as the values corresponding to these keys is just there for some potential future use. The code would work just as well if those values were all 1, say.

Note that these two subroutines run multiple times during the parsing operation. This is a key point. It is not the case that all the calls to topic happen first, and then all the calls to extpage. The multiple calls to these methods alternate.

...coz I'm really doesn't know how these two subroutine connect with each other or what is the run order of them?

The parser takes care of invoking the subroutines at the right time during the parsing; in this case, they get invoked once the parser finishes parsing a Topic or ExternalPage section, respectively. This all happens as the result of the call to $twig->parsefile( './sample.xml'); it is this call that sets off the whole sequence of events that ultimately cause the handlers to be invoked by the parser.

1Sometimes they are also called "hooks", although I have also seen the term "hook" used to refer to the places in the source code for the parser (or whatever) where the callbacks are invoked. You can think of these "hooks" as places provided by the author of the parser where the programmer using the parser can "hang" custom code from.

Update: The first chapter of HOP has a nice discussion of callbacks.

the lowliest monk

  • Comment on Re^5: Memory errors while processing 2GB XML file with XML:Twig on Windows 2000