in reply to Re: utf file to ansi, but doesn't work? in thread utf file to ansi, but doesn't work?
Hi,
I don't have a hex editor ;) (tried using one before to edit the setting on my Blackberry, but couldn't get the nack of it ;))
Is there a simple way I can check the header (i.e "type") of a file? Kinda like you can do with finding file types in images by opening them in Notepad, and then looking for stuff like "gif" etc)
The weird bit though, is that when I run the commands manually via SSH, it updates the "encoding" properly in Notepad++!
iconv --from-code UTF-8 --to-code iso-8859-15 -c /var/home/user/siteforum.com/www/admin/Plugins/Forum/Advertiser/Import/tmp/allVacations.xml.2 > /var/home/user/siteforum.com/www/admin/Plugins/GForum/Advertiser/Import/tmp/allVacations.xml.new
It wouldn't be something related to the way perl invokes this would it? Not had problems going from non-utf8 --> utf8 before, so just wondering why its having issues doing it this way around :(
TIA
Andy
Re^3: utf file to ansi, but doesn't work?
by moritz (Cardinal) on Feb 28, 2011 at 12:32 UTC
|
Hi, I don't have a hex editor ;)
Then get one. No excuses.
Is there a simple way I can check the header (i.e "type") of a file? Kinda like you can do with finding file types in images by opening them in Notepad, and then looking for stuff like "gif" etc)
No. You suspect the automatic recognition of the encoding to be a problem, so you shouldn't trust it to diagnose your problem for you.
It wouldn't be something related to the way perl invokes this would it?
Well, you don't check if the command succeeds, that would be a first step. The documentation tells you how (though autodie is more convenient, if you ask me).
Update: Since your files seem to be XML files: those usually begin with something like <?xml version="1.0" encoding="windows-1252"?>. If the encoding still says UTF-8 or is missing (it defaults to UTF-8), you need to adjust that so that XML processors later on will not complain.
| [reply] [d/l] [select] |
|
Eugh, I've given up on this! I'm simply just doing it with a map {} now for each of the variables, and then doing:
map {
$add->{$_} = utf8($add->{$_})->latin1;
} keys %$add;
...which works fine. Just wish I could work out why it doesn't seem to be reading/adding it properly without! Unfortunatly I don't have hours and hours I can spend on this trying to work it out :(
Thanks as always for your help though - much appreciated
Cheers
Andy | [reply] [d/l] |
Re^3: utf file to ansi, but doesn't work?
by eff_i_g (Curate) on Feb 28, 2011 at 15:24 UTC
|
I don't have a hex editor
Sure you do! od -cx file
Also, have you tried Encode?
| [reply] [d/l] |
|
perl -nE"BEGIN { $/=\16 } say uc unpack 'H*', $_" file
| [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
Re^3: utf file to ansi, but doesn't work?
by fullermd (Priest) on Mar 01, 2011 at 08:53 UTC
|
Is there a simple way I can check the header (i.e "type") of a file?
There isn't in general any such thing. All there is in a text file is what you see; that's what makes it text. UTF-x files can (and sometimes must) have a BOM, but ISO-8859's won't.
Your text editor either has to be told (by you, by default config, etc) what encoding to use, or it can try (and occasionally even succeed) to guess by the patterns of bytes in it. But it has no way to be know unless you tell it. That's why text encoding is such a mess...
| [reply] |
|
|