Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Parsing XML with XML::Simple

by ferreira (Chaplain)
on Dec 18, 2006 at 00:54 UTC ( #590366=note: print w/ replies, xml ) Need Help??


in reply to Parsing XML with XML::Simple

The problem is that you don't have a well-formed XML file. If it were well-formed,
there would be a root element, which is the ancestral of every other ones. Something like this:

<root> <CVS> $Id: File_Find.pl,v 1.1 2006-12-17 19:25:03 eric Exp $ </CVS> <DATE>2006-12-10</DATE> ... <ARTICLE> foo bar baz </ARTICLE> </root>

I think if you do this simple correction, XML::Simple will work right for you. And, by the default,
this root element disappears, so that you'll get at the first level a hash with the keys you want:
CVS, ARTICLE, DATE, etc.

You could always try your XML files against a typical browser (like FireFox, Opera, IE, etc.)
to see if they are well-formed or if some error is pointed.


Comment on Re: Parsing XML with XML::Simple
Download Code
Replies are listed 'Best First'.
Re^2: Parsing XML with XML::Simple
by madbombX (Hermit) on Dec 18, 2006 at 01:03 UTC
    You hit the nail on the head. Everything I had worked just fine. The problem is that when I ran it through Firefox, I noticed that a few points throughout the articles, I have the author emails in the following format:
    First Last <this@that.com>

    That messed everything up and all my original code actually works. Is there a way around that using XML::Simple or XML::Twig so that I don't have to go through EVERY file and remove all instances of that?

      The big issue is that if you have First Last <this@that.com> within your XML, you have bad XML. (It should be First Last &lt;this@that.com&gt;.) It is better to fix these files. I am not sure how you came into this, because XML::Simple usually escapes these things:
      $ perl -MXML::Simple -e "print XMLout({ a => 'a <b>' })" <opt a="a &lt;b&gt;" />

      It could be the version you're using. The example above used

      $ which_pm XML::Simple XML::Simple 2.13 c:/tools/apache/Perl/site/lib/XML/Simple.pm
        I am using XML::Simple version 2.14. I came into this because part of my company policy requires us, in our CVS headers to have that line as part of the template for documents/scripts/etc that go into CVS. Therefore, the XML files that are turned into articles all have those within the top 5 lines. Neither XML::Simple or XML::Twig handle this properly.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://590366]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2015-07-30 11:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (271 votes), past polls