Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: utf file to ansi, but doesn't work?

by ultranerds (Hermit)
on Feb 28, 2011 at 12:00 UTC ( [id://890550]=note: print w/replies, xml ) Need Help??


in reply to Re: utf file to ansi, but doesn't work?
in thread utf file to ansi, but doesn't work?

Hi, I don't have a hex editor ;) (tried using one before to edit the setting on my Blackberry, but couldn't get the nack of it ;))

Is there a simple way I can check the header (i.e "type") of a file? Kinda like you can do with finding file types in images by opening them in Notepad, and then looking for stuff like "gif" etc)

The weird bit though, is that when I run the commands manually via SSH, it updates the "encoding" properly in Notepad++!

iconv --from-code UTF-8 --to-code iso-8859-15 -c /var/home/user/siteforum.com/www/admin/Plugins/Forum/Advertiser/Import/tmp/allVacations.xml.2 > /var/home/user/siteforum.com/www/admin/Plugins/GForum/Advertiser/Import/tmp/allVacations.xml.new

It wouldn't be something related to the way perl invokes this would it? Not had problems going from non-utf8 --> utf8 before, so just wondering why its having issues doing it this way around :(

TIA

Andy
  • Comment on Re^2: utf file to ansi, but doesn't work?

Replies are listed 'Best First'.
Re^3: utf file to ansi, but doesn't work?
by moritz (Cardinal) on Feb 28, 2011 at 12:32 UTC
    Hi, I don't have a hex editor ;)

    Then get one. No excuses.

    Is there a simple way I can check the header (i.e "type") of a file? Kinda like you can do with finding file types in images by opening them in Notepad, and then looking for stuff like "gif" etc)

    No. You suspect the automatic recognition of the encoding to be a problem, so you shouldn't trust it to diagnose your problem for you.

    It wouldn't be something related to the way perl invokes this would it?

    Well, you don't check if the command succeeds, that would be a first step. The documentation tells you how (though autodie is more convenient, if you ask me).

    Update: Since your files seem to be XML files: those usually begin with something like <?xml version="1.0" encoding="windows-1252"?>. If the encoding still says UTF-8 or is missing (it defaults to UTF-8), you need to adjust that so that XML processors later on will not complain.

      Eugh, I've given up on this! I'm simply just doing it with a map {} now for each of the variables, and then doing:
      map { $add->{$_} = utf8($add->{$_})->latin1; } keys %$add;

      ...which works fine. Just wish I could work out why it doesn't seem to be reading/adding it properly without! Unfortunatly I don't have hours and hours I can spend on this trying to work it out :(

      Thanks as always for your help though - much appreciated

      Cheers

      Andy
Re^3: utf file to ansi, but doesn't work?
by eff_i_g (Curate) on Feb 28, 2011 at 15:24 UTC
    I don't have a hex editor
    Sure you do! od -cx file

    Also, have you tried Encode?

      He's on Windows. I'd be surprised if Notepad++ didn't have a hex mode, but one could always use Perl.

      perl -nE"BEGIN { $/=\16 } say uc unpack 'H*', $_" file

        On Windows with paths like /www and /var? I assumed he was viewing files in Notepad++ via Samba and still had terminal access (like me).

        Notepad++ has a hex plugin.

Re^3: utf file to ansi, but doesn't work?
by fullermd (Priest) on Mar 01, 2011 at 08:53 UTC
    Is there a simple way I can check the header (i.e "type") of a file?

    There isn't in general any such thing. All there is in a text file is what you see; that's what makes it text. UTF-x files can (and sometimes must) have a BOM, but ISO-8859's won't.

    Your text editor either has to be told (by you, by default config, etc) what encoding to use, or it can try (and occasionally even succeed) to guess by the patterns of bytes in it. But it has no way to be know unless you tell it. That's why text encoding is such a mess...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://890550]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-24 21:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found