Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

convert files to ansi (8859-1)

by Yaerox (Scribe)
on Mar 29, 2017 at 07:41 UTC ( [id://1186322]=perlquestion: print w/replies, xml ) Need Help??

Yaerox has asked for the wisdom of the Perl Monks concerning the following question:

I need to convert all kinds of files to ANSI (8859-1) except ANSI (8859-1) files, skip those.

When I change my code and try to convert files into utf8 it doesn't work anymore. Seems like the decode line only works like this using decode on utf8. It doesn't distinguish ANSI on ISO.
Or am I doing something wrong?

utf8 to ansi

#!/usr/bin/perl -w use strict; use warnings; use Encode qw(encode decode); my $sFile = ""; my $sLine = ""; my $sCodepoints = ""; if ( $#ARGV == 0 ) { $sFile = $ARGV[0] ; if ( ! -e $sFile ) { die "File '$sFile' doesn't exist!"; } } open( FILE, "<", "$sFile") or die "Couldn't open file '$sFile'!"; { # slurp file into string local $/; $sLine = <FILE>; close( FILE ); } eval { $sCodepoints = decode( "utf8", $sLine, Encode::FB_CROAK ) }; if ( $@ ) { # input was not utf8 print "> No UTF-8, maybe ISO-8859-1 ?\n"; $sCodepoints = $sLine; } open( FILENEW, ">:encoding(iso-8859-1)", "$sFile.new" ) or die "Couldn +'t open file '$sFile.new'!"; print FILENEW $sCodepoints; close( FILENEW );


ansi to utf8

#!/usr/bin/perl -w use strict; use warnings; use Encode qw(encode decode); my $sFile = ""; my $sLine = ""; my $sCodepoints = ""; if ( $#ARGV == 0 ) { $sFile = $ARGV[0] ; if ( ! -e $sFile ) { die "File '$sFile' doesn't exist!"; } } open( FILE, "<", "$sFile") or die "Couldn't open file '$sFile'!"; { # slurp file into string local $/; $sLine = <FILE>; close( FILE ); } eval { $sCodepoints = decode( "iso-8859-1", $sLine, Encode::FB_CROAK ) + }; if ( $@ ) { # input was not iso-8859-1 print "> No ISO-8859-1, maybe UTF8 ?\n"; $sCodepoints = $sLine; } open( FILENEW, ">:encoding(utf8)", "$sFile.new" ) or die "Couldn't ope +n file '$sFile.new'!"; print FILENEW $sCodepoints; close( FILENEW );


I created two test-files ansi.txt and utf8.txt for testing. Content of the files:

1TestöäüÖÄÜß 2TestöäüÖÄÜß 3TestöäüÖÄÜß

Replies are listed 'Best First'.
Re: convert files to ansi (8859-1)
by Corion (Patriarch) on Mar 29, 2017 at 07:54 UTC

    At least in your second example, you are not properly decoding your input.

    eval { $sCodepoints = decode( "iso-8859-1", $sLine, Encode::FB_CROAK ) + }; if ( $@ ) { # input was not iso-8859-1 print "> No ISO-8859-1, maybe UTF8 ?\n"; $sCodepoints = $sLine; }

    Here, $sCodepoints does not contain properly decoded UTF-8.

    I would try a loop over the possible encodings:

    for my $encoding_candidate (qw(iso-8859-1 UTF-8)) { eval { $sCodepoints = decode( $encoding_candidate, $sLine, Encode::FB_ +CROAK ) }; if ( $@ ) { # input was not $encoding_candidate print "> Not $encoding_candidate\n"; #$sCodepoints = $sLine; } }
      At least in your second example, you are not properly decoding your input.
      I don't see what should be wrong? I did the same like in my first example, just turned the utf8 into iso-8859-1 on decode and file-open.

      Your idea using the loop would be my second step. Just now I'm worried because ANSI to UTF8 doesn't work properly.

        You cannot read octets from a file and then hope that Perl will know that you meant UTF-8. You always have to decode your input and encode your output.

Re: convert files to ansi (8859-1)
by vrk (Chaplain) on Mar 29, 2017 at 08:39 UTC

    This may be heretical, but couldn't you just use iconv?

      We used to use iconv, but we have the situation that the files can be utf8 or ansi. If you use ivonc on utf8 files the outcome is corrupt. That's why we're looking for an own perl-solution.
Re: convert files to ansi (8859-1)
by Anonymous Monk on Mar 29, 2017 at 08:02 UTC
      binmode(FILE); didn't changed anything. I never used file-open with :raw, I can give it a try...

      Update: both didn't change anything the outcome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1186322]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-04-18 17:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found