Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Convert \u characters into utf8

by ultranerds (Hermit)
on Feb 02, 2016 at 13:00 UTC ( [id://1154260]=perlquestion: print w/replies, xml ) Need Help??

ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I had a similar question a while back - when trying to convert NORMAL (latin) characters into their correct values:

http://perlmonks.org/?node_id=1148915

I'm now got a similar issue, but this time I need to deal with UTF8. Basically, I have a JSON file that I'm downloading, and it has the content like:

R\u00f6hrenstick 
Farbt\u00f6ne ihre Sch\u00f6nheit
\u00fcber 
The file itself is in ANSI (not UTF8 format)

I naively believe it would be as simple as passing it into

$file = Encode::encode('UTF-8',$file);

Could anyone please point me in the right direction? :)

Thanks!

Andy

Replies are listed 'Best First'.
Re: Convert \u characters into utf8
by Corion (Patriarch) on Feb 02, 2016 at 13:05 UTC

    You don't show us any code we could use to replicate your problem.

    Maybe $file does not contain the file contents?

    I recommend saving files as UTF8 and reading them as raw bytes. JSON modules expect either raw bytes (JSON::XS) or can be configured to accept Latin-1 (JSON).

      Hi,

      Sorry - the code is just being grabbed using wget:

      `wget -O/srv/www/site.net/www/cgi-bin/admin/tmp/in.txt 'https://openapi.etsy.com/v2/shops/Syrestria/listings/active?method=GET&api_key=xxxxx&limit=200&includes=MainImage'`;

      ..and a basic script I wrote, does:

      #!/usr/bin/perl use File::Slurp; use Encode; my $file = read_file("./in.txt"); $file =~ s/\\u(....)/chr hex $1/ge; print "$file\n";


      However, as I explained that does not work well :) (some get encoded, but the vast majority do not)

      Are you suggesting I do something like this?

      use File::Slurp; use Encode; use JSON; use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset); my $file = read_file("./in.txt"); my $json_var = decode_json($file); foreach (@{$json_var->{results}}) { $_->{description} =~ s/([\200-\377]+)/from_utf8({ -string => $1, - +charset => 'ISO-8859-1'})/eg; print "BLA - $_->{description} \n"; }


      Cheers

      Andy

        No. I'm suggesting that you use a JSON module for loading JSON data. There should be no need at least with the two JSON modules I mentioned to manually convert \uXXXX to their Unicode equivalents.

        use JSON; use Data::Dumper; $Data::Dumper::Useqq = 1; my $data = decode_json( $file_content ); warn Dumper $data;

        Note that File::Slurp is horribly broken regarding encodings. Some comments recommend File::Slurper, but I instead roll my own, which isn't rocket surgery either.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1154260]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-25 12:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found