|Perl: the Markov chain saw|
Malformed UTF-8 character Errorby walkingthecow (Friar)
|on May 11, 2010 at 21:15 UTC||Need Help??|
walkingthecow has asked for the
wisdom of the Perl Monks concerning the following question:
I have a mail log file (qmail) that I am processing line by line and trying to create a map so it's easier to read the mail log. The problem is that qmail has so many different patches and is logging in so many different ways. For the most part everything processes; however, I continue to get the following errors for some lines:
Malformed UTF-8 character (unexpected end of string)
Malformed UTF-8 character (unexpected continuation byte 0xae, with no preceding start byte)
The lines that are giving me trouble have odd characters like the following:
@400000004be972c621c0b57c 12760 > 4ÿª^PÜd"}^HL\kõ²k½^T<89>Ê<99>-W!í¾²#©Ø#<81> ^Z´o Û^Noå»N^U^C^A^@ ß<83><9d>^LÊ^Nq%^R^Z.æÆ9 7GÁæ¨'ÂqîÈ<9d>ÿ7³Ð^M
@400000004be9731d05945a34 13997 < Subject: Don<92>t Pay Retail Prices^M
@400000004be9739614b4c9f4 15230 < Subject: The truth about work-at-home opportunities<85>^M
@400000004be9802004c81154 14584 > 4<81>s^BÇQ»z=Ó^H<98>,ý<96>çVN)rPp^UÚq/£<98>È<9f><93><97><89> ^E<9b>QMbs?^KÖRµ/°o$^H¬^Tüë+
How do I either decode these lines, find out what their encoding is, or skip them all together and stop throwing the warnings?
I have tried using Encode::Guess to no avail, and used the following bit of code to possibly give me an idea, but still getting the Malformed errors:
UPDATE: I'd really just be interested in skipping line if it is not UTF-8, or not dealing with these lines at all.