Re^3: Convert XML To Perl Data Structures Using XML::Twig

My bad, I did not notice who had asked the original question. If I had paid attention I would have assumed you knew what a closure was!

I still don't understand very well what your problem is though. Is it that each message is a different XML "document"? In this case the ever helpful FAQ has something to say about it: Q22: I need to process XML documents. The problem is that they are several of them, so the parser dies after the first one, with a message telling me that there is junk after the end of the document. Is there any way I could trick the parser into believing they are all part of a single document?. If that's not the problem, then either post an example of the data, an an example of what it is you do with the data you generate for each message... or live happily ever after with the solution you have ;--)

Comment on Re^3: Convert XML To Perl Data Structures Using XML::Twig

Replies are listed 'Best First'.

Re^4: Convert XML To Perl Data Structures Using XML::Twig
by Limbic~Region (Chancellor) on May 25, 2011 at 14:43 UTC

mirod

Mock up of the log file that I am working with:

2011-04-28 13:25:47 INFO [main:114] <Message><Tag attribute="value">An
+swer</Tag></Message>
2011-04-28 13:45:12 DEBUG [Populate::List:31] <Message><Tag attribute=
+"value">Answer</Tag></Message>
[download]

In other words, a Log4J standard log where the log entry is an XML document. I am parsing the log similar to the code below:

while (<$fh>) {
    chomp;
    my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5;
}
[download]

For each XML document, I need to convert it to a perl data structure and do something with it. That would look something like:

my $twig = XML::Twig->new();
while (<$fh>) {
    chomp;
    my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5;
    my %data_structure;
    $twig->parse($xml);
    # Build up %data_structure using $twig
}
[download]

I could easily change this code to be "elegant" as such:

while (<$fh>) {
    chomp;
    my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5;
    my $data_structure = extract_data($xml);
}

sub extract_data {
    my ($xml) = @_;
    my $data = {};
    my $twig = XML::Twig->new(
        twig_handlers => {
            Message => sub { handle_message(@_, $data) }
        }
    );
    $twig->parse($xml);
    return $data;
}

sub handle_message {
    # ...
}
[download]

There is absolutely nothing wrong with this and I haven't profiled it to see that it isn't fast enough but that is my concern. I would like to inline as much as possible. So now that I have laid it out there I realize if it were someone else asking this question I would tell them to quit being falsely lazy, write it in a clear maintainable way and profile it and only worry about performance if it was unacceptable.

Cheers - L~R

[reply]
[d/l]
[select]


Welcome to the Monastery
	PerlMonks