http://www.perlmonks.org?node_id=982796

sarf13 has asked for the wisdom of the Perl Monks concerning the following question:

How can I use XML::Simple parser to read and parse greater then (>), less then (<) and ampersand (&) symbol. I have a big xml file which I have to read line by line. Then filter the records of specific tag process that generate a report based on some condition. While XML having mentioned symbol come XML::Simple module got stacked and could able to parse the file. Is there any way through which I can mold this module based on my requirement.

  • Comment on How can I use XML::Simple parser to read and parse greater then (>), less then (<) and ampersand (&) symbol..

Replies are listed 'Best First'.
Re: How can I use XML::Simple parser to read and parse greater then (>), less then (<) and ampersand (&) symbol..
by choroba (Cardinal) on Jul 20, 2012 at 11:52 UTC
    Please, be more specific. Does your XML contain unquoted characters < or &? If yes, it is not a valid XML. Maybe you can show a sample of the problematic data?
      hi thank you for your quick reply. well there is no quote present over these symbel. since i have to put validation check for these symbol only. here i am giving sample for my input file.
      <query> insert into name (name, age, address) (select from age_tab where age > + 19 ) </query>

        You might be able to run the file through some filter that'll try to guess which <, > or & belongs to a tag/entity and which one does not, but there's no guarantee it'll guess correctly every time. Fix or get fixed whatever produces this invalid format.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

        You should enclose them in a CDATA block to keep them safe from the parser , all parsers should respect this.

         <tag><![CDATA <?>&;]!> ]]></tag>

        the only thing you cannot put in CDATA is ]]> as its the CDATA terminator , however if its essential then you could do something like

        <tag><![CDATA ]]]]><![CDATA >]]></tag>

        for the full info see http://www.w3schools.com/xml/xml_cdata.asp

Re: How can I use XML::Simple parser to read and parse greater then (>), less then (<) and ampersand (&) symbol..
by aitap (Curate) on Jul 20, 2012 at 12:01 UTC
    Can you show an example of your file? Is it valid XML?
    Sorry if my advice was wrong.
Re: How can I use XML::Simple parser to read and parse greater then (>), less then (<) and ampersand (&) symbol..
by Anonymous Monk on Jul 20, 2012 at 13:24 UTC
    Indeed, I suggest that you find an "XML lint" tool and run it against the file, along with its schema if you have one (you should...) to see if it is, in fact, valid XML. If it is, then I cordially suggest that you may prefer to use a more rugged set of tools such as XML::LibXML and its brethren ... which employs an industry-standard set of XML libraries to reliably do just about anything that you could possibly want to do with an XML file of any size. (Including, I might add, the use of "XPath expressions" ... the Swiss Army Knife of XML.)