Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Parsing XML with XML::Simple

by ferreira (Chaplain)
on Dec 18, 2006 at 00:54 UTC ( #590366=note: print w/replies, xml ) Need Help??

in reply to Parsing XML with XML::Simple

The problem is that you don't have a well-formed XML file. If it were well-formed,
there would be a root element, which is the ancestral of every other ones. Something like this:

<root> <CVS> $Id:,v 1.1 2006-12-17 19:25:03 eric Exp $ </CVS> <DATE>2006-12-10</DATE> ... <ARTICLE> foo bar baz </ARTICLE> </root>

I think if you do this simple correction, XML::Simple will work right for you. And, by the default,
this root element disappears, so that you'll get at the first level a hash with the keys you want:

You could always try your XML files against a typical browser (like FireFox, Opera, IE, etc.)
to see if they are well-formed or if some error is pointed.

Replies are listed 'Best First'.
Re^2: Parsing XML with XML::Simple
by madbombX (Hermit) on Dec 18, 2006 at 01:03 UTC
    You hit the nail on the head. Everything I had worked just fine. The problem is that when I ran it through Firefox, I noticed that a few points throughout the articles, I have the author emails in the following format:
    First Last <>

    That messed everything up and all my original code actually works. Is there a way around that using XML::Simple or XML::Twig so that I don't have to go through EVERY file and remove all instances of that?

      The big issue is that if you have First Last <> within your XML, you have bad XML. (It should be First Last &lt;;.) It is better to fix these files. I am not sure how you came into this, because XML::Simple usually escapes these things:
      $ perl -MXML::Simple -e "print XMLout({ a => 'a <b>' })" <opt a="a &lt;b&gt;" />

      It could be the version you're using. The example above used

      $ which_pm XML::Simple XML::Simple 2.13 c:/tools/apache/Perl/site/lib/XML/
        I am using XML::Simple version 2.14. I came into this because part of my company policy requires us, in our CVS headers to have that line as part of the template for documents/scripts/etc that go into CVS. Therefore, the XML files that are turned into articles all have those within the top 5 lines. Neither XML::Simple or XML::Twig handle this properly.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://590366]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2016-10-25 21:16 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (330 votes). Check out past polls.