Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

XML, Unicode, and Internet Explorer

by Tuppence (Pilgrim)
on Apr 11, 2006 at 08:10 UTC ( [id://542477]=perlquestion: print w/replies, xml ) Need Help??

Tuppence has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks, I once again humbly beg for assistance. This question is not strictly perl related, although it is running through a system using mod_perl and HTML::Mason to fetch the data that is an issue.

I need to be able to fetch Unicode data using internet explorer in an XML document. I'm trying to add some AJAX functionality to my site and Internet Explorer is complaining about my unicode characters in my XML. Mozilla likes it just fine, and I would assume that as soon as I can pull up the page OK in the browser window that my AJAX problem will go away as well and it will get the data directly.

Internet Explorer says "The XML page cannot be displayed" and "An invalid character was found in text content. Error processing resource"

I'm using HTML::Mason's |h escaping, which should be calling HTML::Entities::encode() and making the data happy.

I've tried setting a content-type using $r->content_type of "text/xml; charset=UTF-8", as well as having <?xml version="1.0" encoding="UTF-8"?> as the first line of my XML file - but it doesn't like it. IE breaks on the UTF 8 encoded character.

I suppose as a workaround I could just load the text and do some regex splitting action to get the pieces of data, but I would prefer to use XML and let the browser do the heavy lifting for me.

Thank you in advance for your attention to my problem

Replies are listed 'Best First'.
Re: XML, Unicode, and Internet Explorer
by john_oshea (Priest) on Apr 11, 2006 at 10:23 UTC

    I'm surprised that IE doesn't tell you exactly where the error occurs in the file. Does it actually think the file is utf-8-encoded? Calling up page properties should tell you what it thinks it's dealing with.

    If that doesn't lead anywhere, try running the generated XML through xmllint - that should highlight any errors in what you're generating.

Re: XML, Unicode, and Internet Explorer
by grantm (Parson) on Apr 12, 2006 at 01:10 UTC

    You're definitely on the right track with that content-type header. The RFC (I forget which one) says that the client must use the encoding specified in the Content-type header in preference to any encoding declaration in the document itself. I think in the absence of a charset=... on the Content-type header, the client is meant to assume latin-1. So you should declare your UTF-8 encoding both in your header and in your XML declaration.

    The next question is: are you actually generating UTF-8 data? In our Mason app, we found we needed to add this line to our Apache config:

    PerlSetVar MasonPreamble "use utf8;"

    Without this line, literal Unicode characters in our Mason templates were having extra bytes inserted when the template was compiled.

    To check if the output really is UTF-8, I'd use wget to request a URL and save it to a file then use a hex-dumper like xxd (comes with Vim) to view the file as bytes.

Re: XML, Unicode, and Internet Explorer
by mikeock (Hermit) on Apr 11, 2006 at 14:41 UTC
    You might actually get a kick out of this article about IE. The IE Factor
Re: XML, Unicode, and Internet Explorer
by graff (Chancellor) on Apr 11, 2006 at 20:06 UTC
    I really don't have any experience with IE, but having gotten a few clues elsewhere about how MS stuff behaves around unicode data, I wonder if it might help to put a utf8-formatted "byte order mark" (BOM) at the beginning of your xml file? (Just guessing...)

    To get a utf8 version of the BOM at the beginning of an xml file, you could just do something like this (assuming a bourne-style shell and a unix-like "cat" command):

    perl -CS 'print "\x{feff}"' | cat - somefile.xml > bomfile.xml

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://542477]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-28 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found