Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: utf file to ansi, but doesn't work?

by moritz (Cardinal)
on Feb 28, 2011 at 11:06 UTC ( [id://890546]=note: print w/replies, xml ) Need Help??


in reply to utf file to ansi, but doesn't work?

This "seems" to work... but for some reason when I open this new file in NotePad++, it doesn't seem to recognise the encoding type

So you don't know if the conversion failed, or if your text editor's auto detection failed.

A sure way to find out is to open the file in a hex editor, and manually compare some bytes via encoding tables (for example on Wikipedia) to the characters in the original files.

Shouldn't be a problem for a bunch of ultranerds :-)

  • Comment on Re: utf file to ansi, but doesn't work?

Replies are listed 'Best First'.
Re^2: utf file to ansi, but doesn't work?
by ultranerds (Hermit) on Feb 28, 2011 at 12:00 UTC
    Hi, I don't have a hex editor ;) (tried using one before to edit the setting on my Blackberry, but couldn't get the nack of it ;))

    Is there a simple way I can check the header (i.e "type") of a file? Kinda like you can do with finding file types in images by opening them in Notepad, and then looking for stuff like "gif" etc)

    The weird bit though, is that when I run the commands manually via SSH, it updates the "encoding" properly in Notepad++!

    iconv --from-code UTF-8 --to-code iso-8859-15 -c /var/home/user/siteforum.com/www/admin/Plugins/Forum/Advertiser/Import/tmp/allVacations.xml.2 > /var/home/user/siteforum.com/www/admin/Plugins/GForum/Advertiser/Import/tmp/allVacations.xml.new

    It wouldn't be something related to the way perl invokes this would it? Not had problems going from non-utf8 --> utf8 before, so just wondering why its having issues doing it this way around :(

    TIA

    Andy
      Hi, I don't have a hex editor ;)

      Then get one. No excuses.

      Is there a simple way I can check the header (i.e "type") of a file? Kinda like you can do with finding file types in images by opening them in Notepad, and then looking for stuff like "gif" etc)

      No. You suspect the automatic recognition of the encoding to be a problem, so you shouldn't trust it to diagnose your problem for you.

      It wouldn't be something related to the way perl invokes this would it?

      Well, you don't check if the command succeeds, that would be a first step. The documentation tells you how (though autodie is more convenient, if you ask me).

      Update: Since your files seem to be XML files: those usually begin with something like <?xml version="1.0" encoding="windows-1252"?>. If the encoding still says UTF-8 or is missing (it defaults to UTF-8), you need to adjust that so that XML processors later on will not complain.

        Eugh, I've given up on this! I'm simply just doing it with a map {} now for each of the variables, and then doing:
        map { $add->{$_} = utf8($add->{$_})->latin1; } keys %$add;

        ...which works fine. Just wish I could work out why it doesn't seem to be reading/adding it properly without! Unfortunatly I don't have hours and hours I can spend on this trying to work it out :(

        Thanks as always for your help though - much appreciated

        Cheers

        Andy
      I don't have a hex editor
      Sure you do! od -cx file

      Also, have you tried Encode?

        He's on Windows. I'd be surprised if Notepad++ didn't have a hex mode, but one could always use Perl.

        perl -nE"BEGIN { $/=\16 } say uc unpack 'H*', $_" file
      Is there a simple way I can check the header (i.e "type") of a file?

      There isn't in general any such thing. All there is in a text file is what you see; that's what makes it text. UTF-x files can (and sometimes must) have a BOM, but ISO-8859's won't.

      Your text editor either has to be told (by you, by default config, etc) what encoding to use, or it can try (and occasionally even succeed) to guess by the patterns of bytes in it. But it has no way to be know unless you tell it. That's why text encoding is such a mess...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://890546]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-25 07:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found