Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Perl converting some html entities when I don't want it to

by MorayJ (Acolyte)
on Sep 13, 2012 at 10:49 UTC ( #993447=perlquestion: print w/ replies, xml ) Need Help??
MorayJ has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I'm having trouble with something that I can't google easily, because I'm not sure what is happening.

I am using a loop to read an xml file into a variable. But each line is having some, and not all, html entities converted into their symbols. These are inside attributes which means the file is no longer digestible by the the prog I need to import it back into. Not well-formed, maybe.

<NODE id="431" text="&lt;P align=&amp;quot;left&amp;quot;&gt;Non-mortgage&amp;amp;nbsp;debts is yes&lt;/P&gt;" type="DataInput" contextstring="OtherPriorityDebts" groupnext="True" branchValue="Yes" dataDefaultV="Yes" dataVisible="False">

...gets converted to ...

<NODE id="431" text="<P align=&quot;left&quot;>Non-mortgage&amp;nbsp;debts is yes</P>" type="DataInput" contextstring="OtherPriorityDebts" groupnext="True" branchValue="Yes" dataDefaultV="Yes" dataVisible="False">

So, within the text attribute the &lt; and &gt; has converted to < or >. (and if I don't put <code> around that, the lt or gt converts here as well, so this seems to be standard.)

This is presumably to do with encoding, but I don't really know how to explore it and to make it not happen.

Could anyone advise?

Thanks for any help.

MorayJ

Comment on Perl converting some html entities when I don't want it to
Select or Download Code
Re: Perl converting some html entities when I don't want it to
by Corion (Pope) on Sep 13, 2012 at 10:52 UTC

    Most likely, your XML parser decodes on reading. But on writing, you don't use an XML writer but use print (or a broken XML writer). Ideally, your XML writer would entity-encode attributes and text nodes.

    You will have to show a short, relevant program that reproduces the problem.

Re: Perl converting some html entities when I don't want it to
by daxim (Chaplain) on Sep 13, 2012 at 11:01 UTC
    Your problem is the lack of entity-decoding in line 17 of your program.

    This works:

    use strictures; use HTML::Entities qw(decode_entities); use XML::LibXML qw(); my $xml = XML::LibXML->load_xml(string => <<'XML'); <root> <NODE id="431" text="&lt;P align=&amp;quot;left&amp;quot;&gt;Non-mortg +age&amp;amp;nbsp;debts is yes&lt;/P&gt;" type="DataInput" contextstri +ng="OtherPriorityDebts" group next="True" branchValue="Yes" dataDe +faultV="Yes" dataVisible="False" /> </root> XML for my $node ($xml->findnodes('//NODE')) { my $text = $node->getAttribute('text'); # <P align=&quot;left&quot;>Non-mortgage&amp;nbsp;debts is yes</P> my $html = decode_entities $text; # <P align="left">Non-mortgage&nbsp;debts is yes</P> }
Re: Perl converting some html entities when I don't want it to
by MorayJ (Acolyte) on Sep 13, 2012 at 12:10 UTC

    Great - thanks for the help, both.

    I am not using an XML writer, which is probably the root of the problem...I am using print.

    I have the line print "$key_attrbs=\"$attrbs{$key_attrbs}\""; in a loop of the node to print out all its attributes.

    I have now added above it: encode_entities $attrbs{$key_attrbs};, which seems to have done the trick.(with use HTML::Entities at the beginning)

    I suspect this is not ideal, but it's working for me at the moment

    Thanks a lot for the help

    MorayJ

      ...although I would add, for other unwary travelers, that as it's html entities, it is converting things like the sign to &pound;, which means the xml import still fails on that as it needs warning that you're going to be doing that.

      All in all, will be better in the long run to use an xml writer, though this works as a fix

      Thanks again

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://993447]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2014-08-29 23:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (289 votes), past polls