Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
P is for Practical
 
PerlMonks  

Re^3: Convert XML To Perl Data Structures Using XML::Twig

by mirod (Canon)
on May 25, 2011 at 14:09 UTC ( #906658=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Convert XML To Perl Data Structures Using XML::Twig
in thread Convert XML To Perl Data Structures Using XML::Twig

My bad, I did not notice who had asked the original question. If I had paid attention I would have assumed you knew what a closure was!

I still don't understand very well what your problem is though. Is it that each message is a different XML "document"? In this case the ever helpful FAQ has something to say about it: Q22: I need to process XML documents. The problem is that they are several of them, so the parser dies after the first one, with a message telling me that there is junk after the end of the document. Is there any way I could trick the parser into believing they are all part of a single document?. If that's not the problem, then either post an example of the data, an an example of what it is you do with the data you generate for each message... or live happily ever after with the solution you have ;--)


Comment on Re^3: Convert XML To Perl Data Structures Using XML::Twig
Re^4: Convert XML To Perl Data Structures Using XML::Twig
by Limbic~Region (Chancellor) on May 25, 2011 at 14:43 UTC
    mirod,
    I can't share the actual data (work) but I think the following might make things a little more clear. If not, then I will live happily with the solution that I am currently constructing.

    Mock up of the log file that I am working with:

    2011-04-28 13:25:47 INFO [main:114] <Message><Tag attribute="value">An +swer</Tag></Message> 2011-04-28 13:45:12 DEBUG [Populate::List:31] <Message><Tag attribute= +"value">Answer</Tag></Message>

    In other words, a Log4J standard log where the log entry is an XML document. I am parsing the log similar to the code below:

    while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; }

    For each XML document, I need to convert it to a perl data structure and do something with it. That would look something like:

    my $twig = XML::Twig->new(); while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my %data_structure; $twig->parse($xml); # Build up %data_structure using $twig }

    I could easily change this code to be "elegant" as such:

    while (<$fh>) { chomp; my ($date, $time, $log_lvl, $trace, $xml) = split ' ', $_, 5; my $data_structure = extract_data($xml); } sub extract_data { my ($xml) = @_; my $data = {}; my $twig = XML::Twig->new( twig_handlers => { Message => sub { handle_message(@_, $data) } } ); $twig->parse($xml); return $data; } sub handle_message { # ... }

    There is absolutely nothing wrong with this and I haven't profiled it to see that it isn't fast enough but that is my concern. I would like to inline as much as possible. So now that I have laid it out there I realize if it were someone else asking this question I would tell them to quit being falsely lazy, write it in a clear maintainable way and profile it and only worry about performance if it was unacceptable.

    Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://906658]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2014-04-25 01:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (579 votes), past polls