Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

My PerlTidy utility is choking on non-ASCII characters, help me figure out why?

by Cody Fendant (Friar)
on Aug 25, 2016 at 07:05 UTC ( #1170384=perlquestion: print w/replies, xml ) Need Help??

Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:

I have a Perl::Tidy utility/text filter which works with my editor (BBEdit) and I've just noticed it's killing non-ASCII characters.

The code itself is pretty simple:

#!/usr/local/bin/perl -wn use Perl::Tidy; BEGIN { my $input_string = ""; my $output_string = ""; } $input_string .= $_; END { Perl::Tidy::perltidy( source => \$input_string, destination => \$output_string, argv => '-ce -l=80' ); print "$output_string\n"; }

As you can see it just runs over the input file with  -n, putting it into a scalar. Then Perl::Tidys that scalar, then puts it back.

It works in every other way just as I'd like, but when it encounters unicode characters (I've been working on some Russian text and need to recognise these chars: 'ОЕАИН') it replaces them with question marks.

I can't add command-line flags like -CIO, that's not allowed. I've tried adding binmode STDOUT, ":utf8" and binmode STDIN, ":utf8" to the BEGIN block but that hasn't changed anything.

Of course I can use Perl::Tidy in other ways, but I'm used to this utility and would like to get it working again in a way I can trust, it's become a habit.

Replies are listed 'Best First'.
Re: My PerlTidy utility is choking on non-ASCII characters, help me figure out why?
by kcott (Bishop) on Aug 25, 2016 at 08:18 UTC
      The patches are to convert UTF-8 to an 8-bit encoding, which doesn't work -- the real problem goes much deeper since Perl::Tidy breaks down line input into bytes and processes things byte-by-byte (using maps for the "known 256 chars).... Such maps don't scale so well when you move up to the number of chars in Unicode. It looked to be close to a rewrite of the core code.

      It's been an open bug since 2008 and doesn't look easy to fix so it hasn't been.

      Expect to see it handle unicode sometime right after perlmonks does. ;-)

        ++ Thanks for the feedback.

        — Ken

Re: My PerlTidy utility is choking on non-ASCII characters, help me figure out why?
by beech (Parson) on Aug 25, 2016 at 07:37 UTC

    I can't add command-line flags like -CIO, that's not allowed.

    Forget that for a moment, does it work if you add the flags?

    I've tried adding binmode STDOUT, ":utf8" and binmode STDIN, ":utf8" to the BEGIN block but that hasn't changed anything.

    That hints there problem isn't in perl land ... what happens if you use Data::Dumper::dd instead of print?

      Forget that for a moment, does it work if you add the flags?

      I should have been more specific. It's not allowed in the sense that the script will die with an error message "too late to add -C flag".

        oh, right, forgot about that

        What about the Dumper?

        If i try

        perl -CSD -le " print qq{use utf8;\nprint qq{I \x{2665} Perl\n};\n}; " + >foo.pl

        I get

        use utf8;
        print qq{I ♥ Perl
        };
        

        If I run that through perltidy or your program its unchanged

        cmd.exe doesn't know how to display the unicode without a chcp, but the bytes are the same

        If I add to your program

        use Data::Dump qw/pp /; print STDERR pp("$output_string");

        I get unchanged correct result as expected

        "use utf8;\nprint qq{I \xE2\x99\xA5 Perl\n};\n\n"

        by default cmd.exe does not unicode

        $ chcp
        Active code page: 437
        
        $ type foo.tdy
        use utf8;
        print qq{I ΓΦΡ Perl
        };
        
        

        If I change it I get a heart

        $ chcp 65001
        Active code page: 65001
        
        $ type foo.tdy
        use utf8;
        print qq{I ♥ Perl
        };
        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1170384]
Approved by beech
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2019-11-19 23:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (96 votes). Check out past polls.

    Notices?