Your skill will accomplish what the force of many cannot |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Hi All, I've been put in charge of migrating a Japanese customer site with many Windows machines from an ancient version of Perl (at the moment they're using jperl v5.005_03 MSWin32-x86, SJIS version) to something current like v5.8.8. (As Perl now comes with comprehensive unicode support, IO filters and stuff, the jperl patch is no longer maintained, and, of course, doesn't apply to any recent version of Perl.) The idea is to make the upgrade as smooth as possible. Over the years, lots of little jperl-specific scripts have accumulated at the site (several hundreds, the admins say...). So, ideally, they would not have to touch any of those, but rather just roll out the new version of Perl (plus some compatibility module), and everything should work as before. At least, that's the plan. All the old scripts contain the statement "use I18N::Japanese;" (that's how the specific jperl functionality is enabled in the binary -- the .pm file just contains a "1;"), so I thought a new I18N/Japanese.pm would be the ideal place to put my compatibility code... I figured it would essentially involve saying "use encoding 'cp932';"1 (the old scripts are written in Microsofts CP932, roughly equivalent to SJIS), to make Perl parse any literal strings, regexes, etc. in the script correctly and convert them to Perl's internal unicode format. So far, so good. Thing is, they have code like this2
This doesn't work, because the pathname being passed to system() now is in perl's internal unicode format, instead of the CP932 that the windows side expects. I'm not sure how to handle this best. What I've come up with so far is to override/wrap Perl's internal system() function, in order to do the required conversion of the arguments explicitly:
Although this does work essentially, I can't help thinking this is way more cumbersome than things typically need to be in Perl. In particular, as I would have to write similar wrappers for all other functions that take a filename argument (mkdir(), chdir(), open(), opendir(), rename(), unlink(), glob() and friends...). This can't be it!? :) So, I'm wondering if I'm missing that magic incantation which would somehow convert all filenames to the desired target encoding when passing them to the respective system functions... IOW, what's the best way to emulate the old jperl behaviour with recent versions of Perl? (As I understand things (correct me if I'm wrong), this worked with jperl, because it kept strings internally in SJIS/CP932, and directly operated on this legacy encoding.) Any suggestions welcome. Thanks, __________ 1 actually, in this case, I have to write "require encoding; encoding->import('cp932');" (to avoid the implicit BEGIN{} block). In Perl-5.8.8, "use encoding 'cp932'" seems to be lexically scoped (contrary to what the documentation says). So, putting it in a module wouldn't have any effect on the code that's "use"ing that module. 2 to circumvent the automatic html-entity-ification of the SJIS 8-bit octets (and thus render the code unusable for anyone who'd like to play around with this), I wrote them as hex values -- in the real scripts they are of course as raw SJIS 8-bit values. Just in case, here's the same string as unicode codepoints:
( And, if your browser is able to display the respective unicode entities, that's what the SJIS part looks like:
|
|