Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Convert \u characters into utf8

by ultranerds (Hermit)
on Feb 02, 2016 at 13:00 UTC ( [id://1154260]=perlquestion: print w/replies, xml ) Need Help??

ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I had a similar question a while back - when trying to convert NORMAL (latin) characters into their correct values:

http://perlmonks.org/?node_id=1148915

I'm now got a similar issue, but this time I need to deal with UTF8. Basically, I have a JSON file that I'm downloading, and it has the content like:

R\u00f6hrenstick 
Farbt\u00f6ne ihre Sch\u00f6nheit
\u00fcber 
The file itself is in ANSI (not UTF8 format)

I naively believe it would be as simple as passing it into

$file = Encode::encode('UTF-8',$file);

Could anyone please point me in the right direction? :)

Thanks!

Andy

Replies are listed 'Best First'.
Re: Convert \u characters into utf8
by Corion (Patriarch) on Feb 02, 2016 at 13:05 UTC

    You don't show us any code we could use to replicate your problem.

    Maybe $file does not contain the file contents?

    I recommend saving files as UTF8 and reading them as raw bytes. JSON modules expect either raw bytes (JSON::XS) or can be configured to accept Latin-1 (JSON).

      Hi,

      Sorry - the code is just being grabbed using wget:

      `wget -O/srv/www/site.net/www/cgi-bin/admin/tmp/in.txt 'https://openapi.etsy.com/v2/shops/Syrestria/listings/active?method=GET&api_key=xxxxx&limit=200&includes=MainImage'`;

      ..and a basic script I wrote, does:

      #!/usr/bin/perl use File::Slurp; use Encode; my $file = read_file("./in.txt"); $file =~ s/\\u(....)/chr hex $1/ge; print "$file\n";


      However, as I explained that does not work well :) (some get encoded, but the vast majority do not)

      Are you suggesting I do something like this?

      use File::Slurp; use Encode; use JSON; use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset); my $file = read_file("./in.txt"); my $json_var = decode_json($file); foreach (@{$json_var->{results}}) { $_->{description} =~ s/([\200-\377]+)/from_utf8({ -string => $1, - +charset => 'ISO-8859-1'})/eg; print "BLA - $_->{description} \n"; }


      Cheers

      Andy

        No. I'm suggesting that you use a JSON module for loading JSON data. There should be no need at least with the two JSON modules I mentioned to manually convert \uXXXX to their Unicode equivalents.

        use JSON; use Data::Dumper; $Data::Dumper::Useqq = 1; my $data = decode_json( $file_content ); warn Dumper $data;

        Note that File::Slurp is horribly broken regarding encodings. Some comments recommend File::Slurper, but I instead roll my own, which isn't rocket surgery either.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1154260]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-28 13:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found