in reply to Re: erroneous warning involving locale and input encoding: perl bug?
in thread erroneous warning involving locale and input encoding: perl bug?

My code is designed to not care how the system locale is set: it explicitly says it will use only the ctype category, and then explicitly sets that category to ISO-8859-1, so that the locale settings of the shell are overruled.

What do you think might be wrong with my setlocale() call? locale -a outputs

C POSIX en_US en_US.iso88591 en_US.utf8
The warning still appears even if I give setlocale() its lowest-common-denominator setting, 'C', instead of 'en_US.iso88591'.

Replies are listed 'Best First'.
Re^3: erroneous warning involving locale and input encoding: perl bug?
by Anonymous Monk on Apr 18, 2017 at 01:04 UTC
    What do you think might be wrong with my setlocale() call?
    I see now that it should be ok.

    I don't think that your code should trigger any situation where perl could legimitely generate replacement character (if the input is as you say). Must be a bug.

      Thank you for the feedback. Before reporting it, I'll wait a bit to see if anyone with access to a later perl release can determine whether still happens there.

      Does anyone know a way to suppress this warning while allowing any others to still be displayed? My code trips on several regular expressions, some of which occur inside loops, so I'm getting a lot of noise on the screen. I could have the shell filter stderr, but maybe there's a painless way to control perl warnings with this kind of granularity?

      Update: I see that the no warnings 'locale' pragma will silence this warning... but also probably others that I want to see. I reckon there's no way to silence just that specific warning. At least this will get the job done until the underlying bug is fixed.

        The fault can be reduced to the following:
        use experimental 'smartmatch'; use POSIX 'locale_h'; use locale ':ctype'; setlocale(LC_CTYPE, 'en_US'); $_ = "x"; utf8::upgrade($_); /x(y|z)?/;
        which gives an assert failure on bleadperl. The locale-variant of the TRIE code in the regex engine appears to be treating the 'no more chars' special value of nextchr (-10) as a real large utf8 character:
        && UTF8_IS_ABOVE_LATIN1(nextchr)
        By all means perlbug it

        Dave.