P is for Practical | |
PerlMonks |
utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_numberby igoryonya (Pilgrim) |
on Nov 18, 2014 at 13:55 UTC ( [id://1107573]=perlquestion: print w/replies, xml ) | Need Help?? |
igoryonya has asked for the wisdom of the Perl Monks concerning the following question:
Also, I get: I have it with some file names piped from the find program. It happened only with some file names recently, for the first time of the few years that I've been using and developing this program. Seems like some of the file names are corrupt. When I print out such file names with my program, I get something like:18.09.2012_-Протокол_вскрытия_конвертов_и_рассмотрения_заявок_на_участие_в_конк\xD1 Ф\xD1%80\xD1%8Dнк \xD0%9F\xD1%8C\xD1%8E\xD1%81елик. \xD0%9D\xD0%9B\xD0%9F. \xD0%9C\xD0%95Т\xD0%90 \xD0%9Cодел\xD1%8C.webm The same file names displayed on the terminal by find before piping to my program display:18.09.2012_-Протокол_вскрытия_конвертов_и_рассмотрения_заявок_на_участие_в_конк? Ф?%80?%8Dнк ?%9F?%8C?%8E?%81елик. ?%9D?%9B?%9F. ?%9C?%95Т?%90 ?%9Cодел?%8C.webm As I said, it's the first time I encountered such a problem after a few years of dayly usage of this program.
here is a sample piping launch of the program from the linux terminal: Update
I've just noticed, that the file names get truncated after I tried: find /some/path -type f -exec /path/comparebin.pl {} /path/to_folder/with_similar_dir_tree/ -parameters \; Update 2Thank you all, who participated in my problem solving. To be honest, since I've been trying to convert my programs to unicode, my understanding about this topic was pretty vague, althoug many things. After solving my problem got clarified, there is still a lot to understand about utf8 and unicode in general. When I look at amount of the perl's unicode documentation, it's pretty daunting when I realize that I need to therally read and digest all it. Until now, I thought that unicode is an answer to all textual problems and everything should be in utf8, until I stumbled on this particular problem. Now, I am realizing, that there are excepthions. At first, I didn't even have a clue, where to start to solve my problem, after talking to you. I understood, what needs to be done, but didn't understand, how. That frustrated me, because, I felt like unicode should be behind the curtains and I didn't want to saturate the fun of programming, which I love, with the daunting unicode "bookkeeping". Also, I keep confusin gthe encode and decode commands. Then I calmed down, skimmed the unicode, utf8 and encode documentation for the needed parts and started trying. When I set up a check on every variable, involved in path/file name processing for utf8-ness (utf8::is_utf8) and if it is utf8, set the utf8 flag off (Encode::_utf8_off), along the path of the code, the final paths started resolving for existence (-e). I realize, that if I encounter some part of the path, converted to utf8 and set the flag off, if that path portion was corrupt, before became utf8, the final resulted path could not resolve for existence (-e), but I don't know how to process certain strings without them being converted to character mode, like regex substitution, always returning a value with utf8 flag set, for example, so, for now, I will live it as it is and work on the fix and read more of utf8 and unicode docs when I encounter such problem.
Back to
Seekers of Perl Wisdom
|
|