Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Korean characters messing with all scripts

by Anonymous Monk
on Dec 01, 2016 at 08:17 UTC ( [id://1177018]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hey there monks - I'm working on Korean derived data (just text ) for the first time and literally all my scripts , even super basic ones are breaking in unpredictable ways. I think this has to have something to do with the double byte nature of Korean characters (Which are still all over the data) but I have zero idea how to fix it.. manual translation is not an option ... how can I at least get perl and bash to ignore the weird chars so my scripts work? Any ideas on what to try?

  • Comment on Korean characters messing with all scripts

Replies are listed 'Best First'.
Re: Korean characters messing with all scripts
by Corion (Patriarch) on Dec 01, 2016 at 08:24 UTC

    It's hard to say, because you don't tell us where and how things fail.

    The best approach IMO is to properly Encode::decode the data as you read it in and then properly Encode::encode it again as you write it to its target store.

Re: Korean characters messing with all scripts
by LanX (Saint) on Dec 01, 2016 at 08:25 UTC
    Vague description..
    • what do you do with the data?
    • how does your script break?
    • Is the data UTF8 encoded?
    • If yes do you already use utf8 pragma and related UTF8-modules? see perlunitut

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re: Korean characters messing with all scripts
by kcott (Archbishop) on Dec 01, 2016 at 20:29 UTC
Re: Korean characters messing with all scripts
by 1nickt (Canon) on Dec 05, 2016 at 13:30 UTC
Re: Korean characters messing with all scripts
by Anonymous Monk on Dec 05, 2016 at 09:14 UTC
    Hey all, thanks for the responses . The reason I haven't posted data is because it's confidential and copy past functionality is not possible . Basically things fail when I am printing and a field has a character not recognized .. this leads to all kinds of carriage return errors and other weird outputs I can't understand when I print . How can I diagnose what type of text I'm using ? I don't have anyone tech savvy who can telll me.. so will have to figure out myself. 1) how to tell the type of encoding 2) how to edit to normal

      Hi,

      You don't need to post the real data, just an example that demonstrates the problem and lets us reproduce it. See How do I post a question effectively? and Short, Self Contained, Correct (Compatible) Example.

      Some general tips:

      • Look at the raw data in the file, e.g. hexdump -C FILENAME or od -Ax -tx1z FILENAME, and verify which character encoding is in use.
      • When opening the file, make sure to specify the correct encoding layer, e.g. open my $fh, '<:encoding(UTF-8)', $filename or die $!;
      • When inspecting the data in Perl, don't use print, use either use Data::Dumper; $Data::Dumper::Useqq=1; print Dumper($data);, use Data::Dump 'pp'; pp $data;, or for a really detailed look use Devel::Peek; Dump( $data );

      When explaining the problem here, post all three of the above, that is, a few lines of the hex dump, the code you're using, and the output of one of the dumper modules.

      Hope this helps,
      -- Hauke D

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1177018]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2024-04-23 07:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found