MIME Parsing Entity not exposing charset?

by Anonymous Monk
on Mar 04, 2013 at 15:42 UTC
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm using MIME::Parser to find all text or HTML parts of a MIME file and convert them to UTF8. The problem I'm having is that the Entity collection containing the parts seems to discard the charset of each part. I can't really use, display, or convert the text or HTML to another charset unless I know what the source charset is for each part.

My MIME files contain the charset for each part, and I assumed that the MIME Parser would provide this information through the Entity collection.

#!/usr/bin/perl use MIME::Parser; my $parser = new MIME::Parser; $parser->output_under("./processed"); my $entity = $parser->parse_open("./1b8b5dfc-a31e-44be-bd99-d9b2782a21 +78"); $entity->dump_skeleton;
Content-type: multipart/alternative Effective-type: multipart/alternative Body-file: NONE Subject: =?gb2312?B?bmZi1OebqtH0hoHU9cO01s7Bxg==?= Num-parts: 2 -- Content-type: text/plain Effective-type: text/plain Body-file: processed/msg-1362411528-18768-0/msg-18768-1.txt -- Content-type: text/html Effective-type: text/html Body-file: processed/msg-1362411528-18768-0/msg-18768-2.html --
Return-Path: <> Received: from ( []) by (Postfix) with SMTP id 051ED11008C5 for <>; Wed, 30 Jan 2013 00:07:33 +0000 (UTC) MIME-Version: 1.0 Message-ID: <51086441.000131.11913@cnapp12> Date: Wed, 30 Jan 2013 08:07:29 +0800 (CST) From: "=?gb2312?B?YXJub25lMzg1ODMx?=" <> To:,,, Subject: =?gb2312?B?bmZi1OebqtH0hoHU9cO01s7Bxg==?= X-Priority: 3 X-Originating-IP: [] X-Mailer: <!-- CoreMail Version 3.6.2_release Copyright (c) 2002-2013 --> 163net Content-Type: Multipart/Alternative; boundary="Boundary-=_uJqgnzBUwHgp +xQBYwHbjmGHjYeJw" --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: base64 ytG75sDTz7DVwM2+vfC6tLrbucK9sLHo0/ux98j+te7Su7b9ueq79bXVyNa3qc7OwsW/q7 +bH xqO+4bfgwrbKqr2hvObRycC01LTX1LTz0fOxy7C2VVNBtcTO91/A+8q/o6zIw8TjMzbQoc +qx y+bQxMv50/uyqsbwoaOzo9Cwvb/OvrDAu7TU3rrD17Ky6cbLvci7zbbuxOjCztPsvPXP2N +Sn 0auyy7TzwKvQydHTsbG44LTky6rXo73mzaW0rdfBz7+/8cbjx67F8rnRzvfBprKyyr/V/c +a3 16jC9LXqzqq547TzxNDKv7T4wLS4o9L0oaPJ4L66uPm+ydXPt++3uMPPuPy24MTayN2/tN +Xi wO/T0cfpzOHKvqO6vrTH67fF0MQsytW1vbrzssW4tr/uxrzIx8Kn1ae5qrPLzvXBrML20t +PA 4sSxzubW2b+hybi/07Wuye+zvLLBx+UgDQoNCg== --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: base64 PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNEMEU4RkYiPjxwPjxwPjxwPsrRu+bA08+w1cDNvr +3w urS627nCvbCx6NP7sffI/rXu0ru2/bnqu/W11cjWt6nOzsLFv6u2x8ajvuG34MK2yqq9ob +zm 0ck8L2ZvbnQ+PGZvbnQgc2l6ZT0iNSIgY29sb3I9IltkYXJrXSI+wLTUtNfUtPPR87HLsL +ZV U0G1xM73X8D7yr+jrMjDxOMzNtChyrHL5tDEy/nT+7KqxvChozwvZm9udD48L3A+PHA+PG +Zv bnQgc2l6ZT0iMSIgY29sb3I9IiNFRUZGREQiPrOj0LC9v86+sMC7tNTeusPXsrLpxsu9yL +vN tu7E6MLO0+y89c/Y1KfRq7LLtPPAq9DJ0dOxsbjgtOTLqtejvebNpbSt18HPv7/xxuPHrs +Xy udE8L2ZvbnQ+PC9wPjxwPjxmb250IHNpemU9IjUiIGNvbG9yPSJbZGFya10iPs73waY8L2 +Zv bnQ+PGZvbnQgY29sb3I9IiNGRkU2RjIiIHN0eWxlPSJmb250LXNpemU6IDJwdCI+srI8L2 +Zv bnQ+PGZvbnQgc2l6ZT0iNSIgY29sb3I9IltkYXJrXSI+yr/V/ca316jC9LXqzqq547TzxN +DK v7T4wLS4o9L0oaM8L2ZvbnQ+PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNFRkUwRTAiPsngvr +q4 +b7J1c+377e4w888L2ZvbnQ+PC9wPjxwPjxmb250IHNpemU9IjUiIGNvbG9yPSIjRkYwMD +Aw Ij48YSBocmVmPSJodHRwOi8vd3d3Lnd1ZGkuZ292LmNuL2d0Yi9pbmRleC5qc3A/dXJsPW +h0 dHA6Ly93d3cuSFpQWC5JTkZPIj64/LbgxNrI3b+01eLA7zwvYT48L2ZvbnQ+09HH6czhyr +6j ur60x+u3xdDELMrVtb2687LFuLa/7jxmb250IHNpemU9IjEiIGNvbG9yPSIjRDJEMkQyIj +7G vMjHwqc8L2ZvbnQ+PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNDRUNFQ0UiPtWnuaqzy871wa +zC 9tLTwOLEsc7m1tm/ocm4v9O1rsnvs7yywcflPC9mb250PiA8YnI+PCEtLSB1cmxmaWxlcy +At LT48YnI+PGJyPjwhLS0gZm9vdGVyIC0tPjxicj4NCg0K --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw--

Re: MIME Parsing Entity not exposing charset?
by McA (Priest) on Mar 04, 2013 at 17:32 UTC


    (first best guess answer thrown away)

    I looked at the problem a little bit deeper as I was interested in that domain knowing that I will stumble about that earlier or later.

    Add the following snippet to your code:

    my @parts = $entity->parts(); foreach my $part (@parts) { my $head = $part->head(); print "Effective: " . $part->effective_type() . "\n"; print "Content-Type from Header: " . $head->get('Content-Type') . +"\n"; }
    and you will see that the original information is there. Why it isn't dumped with the method dump_skeleton is something different.

    Best regards

      Thank you!!

      I had figured it out and was about to post the same thing. I believe I had found some old documentation floating around that was outdated and had to dig in to the source code.

Node Type: perlquestion
Approved by Corion
