Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

MIME::Parser::Filer and filenames in Simplified Chinese

by uxbod (Initiate)
on Nov 17, 2011 at 11:41 UTC ( [id://938603]=perlquestion: print w/replies, xml ) Need Help??

uxbod has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a project where I need to be able to extract attachments from an email that have been added in Simplified Chinese. The problem is that when they are extract to the file system I end up with names like ????.doc!

I put together a little test script to show what I mean:

#!/usr/bin/perl use MIME::Parser; use MIME::Parser::Filer; my $tempdir = "extract"; ( -d $tempdir) or mkdir $tempdir, 0755 or die "mkdir: $!"; my $parser = new MIME:arser; $parser->output_under("/home/uxbod/extract"); $parser->extract_uuencode(1); $entity = $parser->parse_open("/home/uxbod/testmessage"); foreach my $part ($entity->parts_DFS) { next if (!$part->bodyhandle); my $rec_filename = $part->head->recommended_filename; my $filename = $part->bodyhandle->path; print "Recommended: $rec_filename Alternative : $filename\n"; } $parser->filer->purge; rmtree $tempdir;

and when this runs I see the following output:

[uxbod@gateway ~]# ./testextract.pl ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/Filer.pm line 659 ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/Filer.pm line 659 Recommended: =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1 Recommended: =?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1-1

As you can see the last two MIME entities are encoded using gb2312 but how can I get that to be the correct name on the file system ? If I extract the file through an email client and transfer it across to that system it does look okay:

-rw-r--r-- 1 uxbod uxbod 34304 Nov 15 10:42 撰稿材料.doc

Any help would be very very much appreciated.

Replies are listed 'Best First'.
Re: MIME::Parser::Filer and filenames in Simplified Chinese
by zwon (Abbot) on Nov 17, 2011 at 15:57 UTC

    Can't help with MIME::Parser, but you can decode name yourself:

    use 5.010; use strict; use warnings; use open qw(:utf8 :std); use Encode::MIME::EncWords; use Encode qw/encode decode/; my $fname = '=?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?='; say decode('MIME-EncWords', $fname);

      Thanks Zwon. I have managed to get a little further but now I have an issue between Simplified and Traditional Chinese.

      The MIME encoding for the following file name

      =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?=

      should decode to:

      DPM2007exchange電郵與郵箱修復.zip

      but when I try and decode that name in Perl it comes out as:

      DPM2007exchange���]�c�]箱修��.zip

      I have installed the Encode::HanExtra module but even with that it is still not showing correctly. Am I missing some other type of module ?

        comes out as where? Showing correctly where?

        Does the program produce the correct bytes?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://938603]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-19 22:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found