in reply to Re: Parsing XML with XML::Simple
in thread Parsing XML with XML::Simple

You hit the nail on the head. Everything I had worked just fine. The problem is that when I ran it through Firefox, I noticed that a few points throughout the articles, I have the author emails in the following format:
First Last <>

That messed everything up and all my original code actually works. Is there a way around that using XML::Simple or XML::Twig so that I don't have to go through EVERY file and remove all instances of that?

Replies are listed 'Best First'.
Re^3: Parsing XML with XML::Simple
by ferreira (Chaplain) on Dec 18, 2006 at 01:14 UTC
    The big issue is that if you have First Last <> within your XML, you have bad XML. (It should be First Last &lt;;.) It is better to fix these files. I am not sure how you came into this, because XML::Simple usually escapes these things:
    $ perl -MXML::Simple -e "print XMLout({ a => 'a <b>' })" <opt a="a &lt;b&gt;" />

    It could be the version you're using. The example above used

    $ which_pm XML::Simple XML::Simple 2.13 c:/tools/apache/Perl/site/lib/XML/
      I am using XML::Simple version 2.14. I came into this because part of my company policy requires us, in our CVS headers to have that line as part of the template for documents/scripts/etc that go into CVS. Therefore, the XML files that are turned into articles all have those within the top 5 lines. Neither XML::Simple or XML::Twig handle this properly.

        So you are suggesting that the XML::Simple file is post-processed in some fashion to insert extra information? The fix in that case is to pre-process the file at the other end to remove said extra information. Alternatively the code using XML::Simple to generate the file might be modified to insert the extra information in a compliant fashion. At this point it depends rather on your data flow and processes.

        DWIM is Perl's answer to Gödel

        If you need the CSV info accessible to the programs you need to enclose the content in <[CDATA[...]]> to make sure the <, > and & characters do not break the XML, if you don't it would be best to use comments:

        <!-- CVS $Id:,v 1.1 2006-12-17 19:25:03 eric Exp $ This That <> Desc: Test file --> <root> ...