Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

MIME Parsing Entity not exposing charset?

by Anonymous Monk
on Mar 04, 2013 at 15:42 UTC ( #1021667=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm using MIME::Parser to find all text or HTML parts of a MIME file and convert them to UTF8. The problem I'm having is that the Entity collection containing the parts seems to discard the charset of each part. I can't really use, display, or convert the text or HTML to another charset unless I know what the source charset is for each part.

My MIME files contain the charset for each part, and I assumed that the MIME Parser would provide this information through the Entity collection.

#!/usr/bin/perl use MIME::Parser; my $parser = new MIME::Parser; $parser->output_under("./processed"); my $entity = $parser->parse_open("./1b8b5dfc-a31e-44be-bd99-d9b2782a21 +78"); $entity->dump_skeleton;
Content-type: multipart/alternative Effective-type: multipart/alternative Body-file: NONE Subject: =?gb2312?B?bmZi1OebqtH0hoHU9cO01s7Bxg==?= Num-parts: 2 -- Content-type: text/plain Effective-type: text/plain Body-file: processed/msg-1362411528-18768-0/msg-18768-1.txt -- Content-type: text/html Effective-type: text/html Body-file: processed/msg-1362411528-18768-0/msg-18768-2.html --
Return-Path: <> Received: from ( []) by (Postfix) with SMTP id 051ED11008C5 for <>; Wed, 30 Jan 2013 00:07:33 +0000 (UTC) MIME-Version: 1.0 Message-ID: <51086441.000131.11913@cnapp12> Date: Wed, 30 Jan 2013 08:07:29 +0800 (CST) From: "=?gb2312?B?YXJub25lMzg1ODMx?=" <> To:,,, Subject: =?gb2312?B?bmZi1OebqtH0hoHU9cO01s7Bxg==?= X-Priority: 3 X-Originating-IP: [] X-Mailer: <!-- CoreMail Version 3.6.2_release Copyright (c) 2002-2013 --> 163net Content-Type: Multipart/Alternative; boundary="Boundary-=_uJqgnzBUwHgp +xQBYwHbjmGHjYeJw" --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: base64 ytG75sDTz7DVwM2+vfC6tLrbucK9sLHo0/ux98j+te7Su7b9ueq79bXVyNa3qc7OwsW/q7 +bH xqO+4bfgwrbKqr2hvObRycC01LTX1LTz0fOxy7C2VVNBtcTO91/A+8q/o6zIw8TjMzbQoc +qx y+bQxMv50/uyqsbwoaOzo9Cwvb/OvrDAu7TU3rrD17Ky6cbLvci7zbbuxOjCztPsvPXP2N +Sn 0auyy7TzwKvQydHTsbG44LTky6rXo73mzaW0rdfBz7+/8cbjx67F8rnRzvfBprKyyr/V/c +a3 16jC9LXqzqq547TzxNDKv7T4wLS4o9L0oaPJ4L66uPm+ydXPt++3uMPPuPy24MTayN2/tN +Xi wO/T0cfpzOHKvqO6vrTH67fF0MQsytW1vbrzssW4tr/uxrzIx8Kn1ae5qrPLzvXBrML20t +PA 4sSxzubW2b+hybi/07Wuye+zvLLBx+UgDQoNCg== --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: base64 PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNEMEU4RkYiPjxwPjxwPjxwPsrRu+bA08+w1cDNvr +3w urS627nCvbCx6NP7sffI/rXu0ru2/bnqu/W11cjWt6nOzsLFv6u2x8ajvuG34MK2yqq9ob +zm 0ck8L2ZvbnQ+PGZvbnQgc2l6ZT0iNSIgY29sb3I9IltkYXJrXSI+wLTUtNfUtPPR87HLsL +ZV U0G1xM73X8D7yr+jrMjDxOMzNtChyrHL5tDEy/nT+7KqxvChozwvZm9udD48L3A+PHA+PG +Zv bnQgc2l6ZT0iMSIgY29sb3I9IiNFRUZGREQiPrOj0LC9v86+sMC7tNTeusPXsrLpxsu9yL +vN tu7E6MLO0+y89c/Y1KfRq7LLtPPAq9DJ0dOxsbjgtOTLqtejvebNpbSt18HPv7/xxuPHrs +Xy udE8L2ZvbnQ+PC9wPjxwPjxmb250IHNpemU9IjUiIGNvbG9yPSJbZGFya10iPs73waY8L2 +Zv bnQ+PGZvbnQgY29sb3I9IiNGRkU2RjIiIHN0eWxlPSJmb250LXNpemU6IDJwdCI+srI8L2 +Zv bnQ+PGZvbnQgc2l6ZT0iNSIgY29sb3I9IltkYXJrXSI+yr/V/ca316jC9LXqzqq547TzxN +DK v7T4wLS4o9L0oaM8L2ZvbnQ+PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNFRkUwRTAiPsngvr +q4 +b7J1c+377e4w888L2ZvbnQ+PC9wPjxwPjxmb250IHNpemU9IjUiIGNvbG9yPSIjRkYwMD +Aw Ij48YSBocmVmPSJodHRwOi8vd3d3Lnd1ZGkuZ292LmNuL2d0Yi9pbmRleC5qc3A/dXJsPW +h0 dHA6Ly93d3cuSFpQWC5JTkZPIj64/LbgxNrI3b+01eLA7zwvYT48L2ZvbnQ+09HH6czhyr +6j ur60x+u3xdDELMrVtb2687LFuLa/7jxmb250IHNpemU9IjEiIGNvbG9yPSIjRDJEMkQyIj +7G vMjHwqc8L2ZvbnQ+PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNDRUNFQ0UiPtWnuaqzy871wa +zC 9tLTwOLEsc7m1tm/ocm4v9O1rsnvs7yywcflPC9mb250PiA8YnI+PCEtLSB1cmxmaWxlcy +At LT48YnI+PGJyPjwhLS0gZm9vdGVyIC0tPjxicj4NCg0K --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw--

Replies are listed 'Best First'.
Re: MIME Parsing Entity not exposing charset?
by McA (Priest) on Mar 04, 2013 at 17:32 UTC


    (first best guess answer thrown away)

    I looked at the problem a little bit deeper as I was interested in that domain knowing that I will stumble about that earlier or later.

    Add the following snippet to your code:

    my @parts = $entity->parts(); foreach my $part (@parts) { my $head = $part->head(); print "Effective: " . $part->effective_type() . "\n"; print "Content-Type from Header: " . $head->get('Content-Type') . +"\n"; }
    and you will see that the original information is there. Why it isn't dumped with the method dump_skeleton is something different.

    Best regards

      Thank you!!

      I had figured it out and was about to post the same thing. I believe I had found some old documentation floating around that was outdated and had to dig in to the source code.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021667]
Approved by Corion
[holli]: i don't know. meat and tomatoes don't belong anywhere near each other in my book
[holli]: unless the meat is in sausage form and the tomatoes being ketchup, that is

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2017-11-22 09:21 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (316 votes). Check out past polls.