Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

MIME::Parser::Filer and filenames in Simplified Chinese

by uxbod (Initiate)
on Nov 17, 2011 at 11:41 UTC ( #938603=perlquestion: print w/replies, xml ) Need Help??
uxbod has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a project where I need to be able to extract attachments from an email that have been added in Simplified Chinese. The problem is that when they are extract to the file system I end up with names like ????.doc!

I put together a little test script to show what I mean:

#!/usr/bin/perl use MIME::Parser; use MIME::Parser::Filer; my $tempdir = "extract"; ( -d $tempdir) or mkdir $tempdir, 0755 or die "mkdir: $!"; my $parser = new MIME:arser; $parser->output_under("/home/uxbod/extract"); $parser->extract_uuencode(1); $entity = $parser->parse_open("/home/uxbod/testmessage"); foreach my $part ($entity->parts_DFS) { next if (!$part->bodyhandle); my $rec_filename = $part->head->recommended_filename; my $filename = $part->bodyhandle->path; print "Recommended: $rec_filename Alternative : $filename\n"; } $parser->filer->purge; rmtree $tempdir;

and when this runs I see the following output:

[uxbod@gateway ~]# ./ ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/ line 659 ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/ line 659 Recommended: =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1 Recommended: =?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1-1

As you can see the last two MIME entities are encoded using gb2312 but how can I get that to be the correct name on the file system ? If I extract the file through an email client and transfer it across to that system it does look okay:

-rw-r--r-- 1 uxbod uxbod 34304 Nov 15 10:42 撰稿材料.doc

Any help would be very very much appreciated.

Replies are listed 'Best First'.
Re: MIME::Parser::Filer and filenames in Simplified Chinese
by zwon (Abbot) on Nov 17, 2011 at 15:57 UTC

    Can't help with MIME::Parser, but you can decode name yourself:

    use 5.010; use strict; use warnings; use open qw(:utf8 :std); use Encode::MIME::EncWords; use Encode qw/encode decode/; my $fname = '=?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?='; say decode('MIME-EncWords', $fname);

      Thanks Zwon. I have managed to get a little further but now I have an issue between Simplified and Traditional Chinese.

      The MIME encoding for the following file name


      should decode to:


      but when I try and decode that name in Perl it comes out as:


      I have installed the Encode::HanExtra module but even with that it is still not showing correctly. Am I missing some other type of module ?

        comes out as where? Showing correctly where?

        Does the program produce the correct bytes?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://938603]
Approved by marto
[Lady_Aleena]: I was thinking about moving WorldBuilding to Random, since it is all randomness in that group anyway, but Random::World is the same length as WorldBuilding. I wouldn't be doing myself any favors.

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2017-05-25 03:54 GMT
Find Nodes?
    Voting Booth?