Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

MIME::Parser::Filer and filenames in Simplified Chinese

by uxbod (Initiate)
on Nov 17, 2011 at 11:41 UTC ( #938603=perlquestion: print w/ replies, xml ) Need Help??
uxbod has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a project where I need to be able to extract attachments from an email that have been added in Simplified Chinese. The problem is that when they are extract to the file system I end up with names like ????.doc!

I put together a little test script to show what I mean:

#!/usr/bin/perl use MIME::Parser; use MIME::Parser::Filer; my $tempdir = "extract"; ( -d $tempdir) or mkdir $tempdir, 0755 or die "mkdir: $!"; my $parser = new MIME:arser; $parser->output_under("/home/uxbod/extract"); $parser->extract_uuencode(1); $entity = $parser->parse_open("/home/uxbod/testmessage"); foreach my $part ($entity->parts_DFS) { next if (!$part->bodyhandle); my $rec_filename = $part->head->recommended_filename; my $filename = $part->bodyhandle->path; print "Recommended: $rec_filename Alternative : $filename\n"; } $parser->filer->purge; rmtree $tempdir;

and when this runs I see the following output:

[uxbod@gateway ~]# ./testextract.pl ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/Filer.pm line 659 ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/Filer.pm line 659 Recommended: =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1 Recommended: =?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1-1

As you can see the last two MIME entities are encoded using gb2312 but how can I get that to be the correct name on the file system ? If I extract the file through an email client and transfer it across to that system it does look okay:

-rw-r--r-- 1 uxbod uxbod 34304 Nov 15 10:42 撰稿材料.doc

Any help would be very very much appreciated.

Comment on MIME::Parser::Filer and filenames in Simplified Chinese
Select or Download Code
Re: MIME::Parser::Filer and filenames in Simplified Chinese
by zwon (Monsignor) on Nov 17, 2011 at 15:57 UTC

    Can't help with MIME::Parser, but you can decode name yourself:

    use 5.010; use strict; use warnings; use open qw(:utf8 :std); use Encode::MIME::EncWords; use Encode qw/encode decode/; my $fname = '=?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?='; say decode('MIME-EncWords', $fname);

      Thanks Zwon. I have managed to get a little further but now I have an issue between Simplified and Traditional Chinese.

      The MIME encoding for the following file name

      =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?=

      should decode to:

      DPM2007exchange電郵與郵箱修復.zip

      but when I try and decode that name in Perl it comes out as:

      DPM2007exchange���]�c�]箱修��.zip

      I have installed the Encode::HanExtra module but even with that it is still not showing correctly. Am I missing some other type of module ?

        comes out as where? Showing correctly where?

        Does the program produce the correct bytes?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://938603]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2014-12-28 20:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls