It's hard to say, because you don't tell us where and how things fail.
The best approach IMO is to properly Encode::decode the data as you read it in and then properly Encode::encode it again as you write it to its target store.
| [reply] |
Vague description..
- what do you do with the data?
- how does your script break?
- Is the data UTF8 encoded?
- If yes do you already use utf8 pragma and related UTF8-modules? see perlunitut
| [reply] |
| [reply] |
| [reply] |
Hey all, thanks for the responses . The reason I haven't posted data is because it's confidential and copy past functionality is not possible . Basically things fail when I am printing and a field has a character not recognized .. this leads to all kinds of carriage return errors and other weird outputs I can't understand when I print .
How can I diagnose what type of text I'm using ? I don't have anyone tech savvy who can telll me.. so will have to figure out myself.
1) how to tell the type of encoding
2) how to edit to normal
| [reply] |
Hi,
You don't need to post the real data, just an example that demonstrates the problem and lets us reproduce it. See How do I post a question effectively? and Short, Self Contained, Correct (Compatible) Example.
Some general tips:
- Look at the raw data in the file, e.g. hexdump -C FILENAME or od -Ax -tx1z FILENAME, and verify which character encoding is in use.
- When opening the file, make sure to specify the correct encoding layer, e.g. open my $fh, '<:encoding(UTF-8)', $filename or die $!;
- When inspecting the data in Perl, don't use print, use either use Data::Dumper; $Data::Dumper::Useqq=1; print Dumper($data);, use Data::Dump 'pp'; pp $data;, or for a really detailed look use Devel::Peek; Dump( $data );
When explaining the problem here, post all three of the above, that is, a few lines of the hex dump, the code you're using, and the output of one of the dumper modules.
Hope this helps, -- Hauke D
| [reply] [d/l] [select] |