http://www.perlmonks.org?node_id=924119

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have been using Compress::Bzip2 to uncompress & compress files for a while now. However, now I need to work with files in unicode.

I know about

binmode(STDIN, ":utf8"); binmode(STDOUT, ":utf8"); binmode(STDERR, ":utf8");
and open FILE, '<:utf8', $file when reading from/writing to normal text files, but I could not find anything with Google in this case.

Is there a way to tell Perl the files are in Unicode when reading from/writing to them using

my $bzIn = bzopen($in, "rb") or die "Can't open stdin: $bzerrno\n"; my $bzOut = bzopen($out, "wb") or die "Can't open stdout: $bzerrno\n";

Am I going to have to switch to another package for compression, provided I find one that can deal with unicode, or am I going to have to do it the old fashioned way by decompressing first and then processing the plain text files?

Any suggestion here?

Replies are listed 'Best First'.
Re: Unicode in bz2 compressed files
by Anonymous Monk on Sep 05, 2011 at 06:26 UTC

      Thank you, wise friend.

      Although I had read the tutorials, I hadn't quite understood what this manual encoding and decoding was about, and how it applied in my case. Your example has made it clear to me.