Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

MIME Parsing Entity not exposing charset?

by Anonymous Monk
on Mar 04, 2013 at 15:42 UTC ( #1021667=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm using MIME::Parser to find all text or HTML parts of a MIME file and convert them to UTF8. The problem I'm having is that the Entity collection containing the parts seems to discard the charset of each part. I can't really use, display, or convert the text or HTML to another charset unless I know what the source charset is for each part.

My MIME files contain the charset for each part, and I assumed that the MIME Parser would provide this information through the Entity collection.

#!/usr/bin/perl use MIME::Parser; my $parser = new MIME::Parser; $parser->output_under("./processed"); my $entity = $parser->parse_open("./1b8b5dfc-a31e-44be-bd99-d9b2782a21 +78"); $entity->dump_skeleton;
OUTPUT
Content-type: multipart/alternative Effective-type: multipart/alternative Body-file: NONE Subject: =?gb2312?B?bmZi1OebqtH0hoHU9cO01s7Bxg==?= Num-parts: 2 -- Content-type: text/plain Effective-type: text/plain Body-file: processed/msg-1362411528-18768-0/msg-18768-1.txt -- Content-type: text/html Effective-type: text/html Body-file: processed/msg-1362411528-18768-0/msg-18768-2.html --
FILE
Return-Path: <arnone385831@tom.com> Received: from tom.com (cnsmtpr5.tom.com [218.30.111.155]) by ybn.me (Postfix) with SMTP id 051ED11008C5 for <fb@ybn.me>; Wed, 30 Jan 2013 00:07:33 +0000 (UTC) MIME-Version: 1.0 Message-ID: <51086441.000131.11913@cnapp12> Date: Wed, 30 Jan 2013 08:07:29 +0800 (CST) From: "=?gb2312?B?YXJub25lMzg1ODMx?=" <arnone385831@tom.com> To: fb@yahoo.com.cnfe,fb@ybn.me,fb@yes99.tw,fb@yinxingroup.com Subject: =?gb2312?B?bmZi1OebqtH0hoHU9cO01s7Bxg==?= X-Priority: 3 X-Originating-IP: [113.64.172.210] X-Mailer: <!-- CoreMail Version 3.6.2_release Copyright (c) 2002-2013 +www.mailtech.cn --> 163net Content-Type: Multipart/Alternative; boundary="Boundary-=_uJqgnzBUwHgp +xQBYwHbjmGHjYeJw" --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: base64 ytG75sDTz7DVwM2+vfC6tLrbucK9sLHo0/ux98j+te7Su7b9ueq79bXVyNa3qc7OwsW/q7 +bH xqO+4bfgwrbKqr2hvObRycC01LTX1LTz0fOxy7C2VVNBtcTO91/A+8q/o6zIw8TjMzbQoc +qx y+bQxMv50/uyqsbwoaOzo9Cwvb/OvrDAu7TU3rrD17Ky6cbLvci7zbbuxOjCztPsvPXP2N +Sn 0auyy7TzwKvQydHTsbG44LTky6rXo73mzaW0rdfBz7+/8cbjx67F8rnRzvfBprKyyr/V/c +a3 16jC9LXqzqq547TzxNDKv7T4wLS4o9L0oaPJ4L66uPm+ydXPt++3uMPPuPy24MTayN2/tN +Xi wO/T0cfpzOHKvqO6vrTH67fF0MQsytW1vbrzssW4tr/uxrzIx8Kn1ae5qrPLzvXBrML20t +PA 4sSxzubW2b+hybi/07Wuye+zvLLBx+UgDQoNCg== --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: base64 PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNEMEU4RkYiPjxwPjxwPjxwPsrRu+bA08+w1cDNvr +3w urS627nCvbCx6NP7sffI/rXu0ru2/bnqu/W11cjWt6nOzsLFv6u2x8ajvuG34MK2yqq9ob +zm 0ck8L2ZvbnQ+PGZvbnQgc2l6ZT0iNSIgY29sb3I9IltkYXJrXSI+wLTUtNfUtPPR87HLsL +ZV U0G1xM73X8D7yr+jrMjDxOMzNtChyrHL5tDEy/nT+7KqxvChozwvZm9udD48L3A+PHA+PG +Zv bnQgc2l6ZT0iMSIgY29sb3I9IiNFRUZGREQiPrOj0LC9v86+sMC7tNTeusPXsrLpxsu9yL +vN tu7E6MLO0+y89c/Y1KfRq7LLtPPAq9DJ0dOxsbjgtOTLqtejvebNpbSt18HPv7/xxuPHrs +Xy udE8L2ZvbnQ+PC9wPjxwPjxmb250IHNpemU9IjUiIGNvbG9yPSJbZGFya10iPs73waY8L2 +Zv bnQ+PGZvbnQgY29sb3I9IiNGRkU2RjIiIHN0eWxlPSJmb250LXNpemU6IDJwdCI+srI8L2 +Zv bnQ+PGZvbnQgc2l6ZT0iNSIgY29sb3I9IltkYXJrXSI+yr/V/ca316jC9LXqzqq547TzxN +DK v7T4wLS4o9L0oaM8L2ZvbnQ+PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNFRkUwRTAiPsngvr +q4 +b7J1c+377e4w888L2ZvbnQ+PC9wPjxwPjxmb250IHNpemU9IjUiIGNvbG9yPSIjRkYwMD +Aw Ij48YSBocmVmPSJodHRwOi8vd3d3Lnd1ZGkuZ292LmNuL2d0Yi9pbmRleC5qc3A/dXJsPW +h0 dHA6Ly93d3cuSFpQWC5JTkZPIj64/LbgxNrI3b+01eLA7zwvYT48L2ZvbnQ+09HH6czhyr +6j ur60x+u3xdDELMrVtb2687LFuLa/7jxmb250IHNpemU9IjEiIGNvbG9yPSIjRDJEMkQyIj +7G vMjHwqc8L2ZvbnQ+PGZvbnQgc2l6ZT0iMSIgY29sb3I9IiNDRUNFQ0UiPtWnuaqzy871wa +zC 9tLTwOLEsc7m1tm/ocm4v9O1rsnvs7yywcflPC9mb250PiA8YnI+PCEtLSB1cmxmaWxlcy +At LT48YnI+PGJyPjwhLS0gZm9vdGVyIC0tPjxicj4NCg0K --Boundary-=_uJqgnzBUwHgpxQBYwHbjmGHjYeJw--

Comment on MIME Parsing Entity not exposing charset?
Select or Download Code
Re: MIME Parsing Entity not exposing charset?
by McA (Priest) on Mar 04, 2013 at 17:32 UTC

    Hi,

    (first best guess answer thrown away)

    I looked at the problem a little bit deeper as I was interested in that domain knowing that I will stumble about that earlier or later.

    Add the following snippet to your code:

    my @parts = $entity->parts(); foreach my $part (@parts) { my $head = $part->head(); print "Effective: " . $part->effective_type() . "\n"; print "Content-Type from Header: " . $head->get('Content-Type') . +"\n"; }
    and you will see that the original information is there. Why it isn't dumped with the method dump_skeleton is something different.

    Best regards
    McA

      Thank you!!

      I had figured it out and was about to post the same thing. I believe I had found some old documentation floating around that was outdated and had to dig in to the source code.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021667]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2015-05-29 22:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    In my home, the TV remote control is ...









    Results (593 votes), past polls