I am working on a project where I need to be able to extract attachments from an email that have been added in Simplified Chinese. The problem is that when they are extract to the file system I end up with names like ????.doc!
I put together a little test script to show what I mean:
#!/usr/bin/perl
use MIME::Parser;
use MIME::Parser::Filer;
my $tempdir = "extract";
( -d $tempdir) or mkdir $tempdir, 0755 or die "mkdir: $!";
my $parser = new MIME:arser;
$parser->output_under("/home/uxbod/extract");
$parser->extract_uuencode(1);
$entity = $parser->parse_open("/home/uxbod/testmessage");
foreach my $part ($entity->parts_DFS) {
next if (!$part->bodyhandle);
my $rec_filename = $part->head->recommended_filename;
my $filename = $part->bodyhandle->path;
print "Recommended: $rec_filename Alternative : $filename\n";
}
$parser->filer->purge;
rmtree $tempdir;
and when this runs I see the following output:
[uxbod@gateway ~]# ./testextract.pl
ignoring text in character set `GB2312'
at /usr/share/perl5/MIME/Parser/Filer.pm line 659
ignoring text in character set `GB2312'
at /usr/share/perl5/MIME/Parser/Filer.pm line 659
Recommended: =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?= Alt
+ernative : /home/uxbod/extract/msg-1321526988-4755-0/1
Recommended: =?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?= Alt
+ernative : /home/uxbod/extract/msg-1321526988-4755-0/1-1
As you can see the last two MIME entities are encoded using gb2312 but how can I get that to be the correct name on the file system ? If I extract the file through an email client and transfer it across to that system it does look okay:
-rw-r--r-- 1 uxbod uxbod 34304 Nov 15 10:42 撰稿材料.doc
Any help would be very very much appreciated.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|