comment on

I am working on a project where I need to be able to extract attachments from an email that have been added in Simplified Chinese. The problem is that when they are extract to the file system I end up with names like ????.doc!

I put together a little test script to show what I mean:

#!/usr/bin/perl

use MIME::Parser;
use MIME::Parser::Filer;

my $tempdir = "extract";
( -d $tempdir) or mkdir $tempdir, 0755 or die "mkdir: $!";

my $parser = new MIME:arser;

$parser->output_under("/home/uxbod/extract");
$parser->extract_uuencode(1);
$entity = $parser->parse_open("/home/uxbod/testmessage");

foreach my $part ($entity->parts_DFS) {

  next if (!$part->bodyhandle);

  my $rec_filename = $part->head->recommended_filename;
  my $filename = $part->bodyhandle->path;
  print "Recommended: $rec_filename Alternative : $filename\n";
}

$parser->filer->purge;
rmtree $tempdir;
[download]

and when this runs I see the following output:

[uxbod@gateway ~]# ./testextract.pl
ignoring text in character set `GB2312'
 at /usr/share/perl5/MIME/Parser/Filer.pm line 659
ignoring text in character set `GB2312'
 at /usr/share/perl5/MIME/Parser/Filer.pm line 659
Recommended: =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?= Alt
+ernative : /home/uxbod/extract/msg-1321526988-4755-0/1
Recommended: =?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?= Alt
+ernative : /home/uxbod/extract/msg-1321526988-4755-0/1-1
[download]

As you can see the last two MIME entities are encoded using gb2312 but how can I get that to be the correct name on the file system ? If I extract the file through an email client and transfer it across to that system it does look okay:

-rw-r--r-- 1 uxbod uxbod 34304 Nov 15 10:42 撰稿材料.doc

Any help would be very very much appreciated.

In reply to MIME::Parser::Filer and filenames in Simplified Chinese by uxbod

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


"be consistent"
	PerlMonks