What is supposed to be showing up? Where is the data coming from? What is your script doing to it before printing it?
The warning message about "wide character in print" was telling you that you were using "print" (or printf) with the output file handle, the data being printed contained strings flagged as containing non-ASCII utf8 characters, and the handle had not been declared to accommodate such data.
Sure, changing the file handle to accommodate utf8 data gets rid of the warning, but it doesn't really change the data that caused the warning in the first place.
You need to supply more information. There are a variety of possible "solutions" -- changing how you view the data, changing the data in any of various ways before printing it, and so on -- but we don't know enough about your problem yet to make a recommendation.
Update: Now that you have supplied more information, I can make a few observations:
- Whenever you change the content of your post, please make it clear to others that you have changed the content -- use "update:" (like I've done here) to indicate what has been added, and put <strike> ... </strike> around things that you want to delete (rather than just deleting them), so that replies that were based on your original post will still make sense.
- Based on the context you've added around the "weird characters" (originally you just showed those characters in isolation), it looks like you are downloading a page that might be using some character set other than utf8, and it's being interpreted incorrectly as (or into) utf8.
You should try looking at the original content in a browser window, and use different character encodings in that window until you see a display that makes sense. That's one way to figure out which encoding is being used in the source data.
I would expect that the true character encoding being used would be mentioned somewhere in the data, as part of the header, or a tag attribute, or something -- that's another way to find that out.
If you just want to get rid of the wide characters, you can do this, which will work no matter what is going wrong with the encoding:
s/[^[:ascii:]]+//g; # get rid of non-ASCII characters
If you need to keep those characters, the first thing is to look at your output using a browser, so that utf8 data are displayed correctly using utf8 characters. In that view, if you see two or more characters where you expected to see only one, you'll need to figure out how to use the Encode module on your data.
But if you get to that point and can't figure it out, you'll need to show us a small amount of usable code that demonstrates the problem. Just saying "I'm using SOAP::Lite" (as you did in your first update) isn't enough.
Another update: That sequence of three non-ASCII bytes that shows up twice in your updated sample text happens to be "\xE2","\x80","\x93". This is the utf8 byte sequence to express the unicode character "\x{2013}", which turns out to be "EN DASH" (in other words, a hyphen). To see it as a hyphen, you could just do s/\x{2013}/-/g; on your text data.
(But if you're getting other wide characters beside that one, you might not find suitable ASCII correlates for all of them, so this may not work out as a general solution.) |