Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

erroneous warning involving locale and input encoding: perl bug?

by raygun (Sexton)
on Apr 17, 2017 at 19:51 UTC ( #1188156=perlquestion: print w/replies, xml ) Need Help??
raygun has asked for the wisdom of the Perl Monks concerning the following question:

Hello, wise monks. I believe the code below demonstrates a perl bug, but before reporting it as such, I'd like to run it by the perl cognoscenti, and make sure I'm not doing something foolish. I am running perl v5.22.2 on i686-linux.

The bug is a warning perl emits that seems completely inapplicable, A precise interaction of a number of components seems to trigger it; I've not found a way to further pare down the code snippet below and still trigger the bug. In particular:

  • it only happens when input comes from a file; I cannot reproduce it by redirecting stdin, using a DATA block, or any other of the usual means of crafting an example that doesn't rely on external files
  • the input-file encoding must be specified as iso-8859-1, even if the input file contains only the ASCII subset of this character set
  • smartmatch must be activated, even though this code snippet doesn't use it
  • the regular expression is not the simplest way to express this particular match, but all its components seem necessary for the bug to show up
Here is my code:
#!/usr/bin/perl use experimental 'smartmatch'; use open ':encoding(iso-8859-1)'; use POSIX 'locale_h'; use locale ':ctype'; setlocale(LC_CTYPE, 'en_US.iso88591'); open (FILE, '< s2') || die "Cannot open\n"; while (<FILE>) { chomp; print "--$_--\n"; print "ends with x and optional y or z\n" if /x(y|z)?$/; } close (FILE);
and here is a sample input file (filename "s2" hard-coded in the the perl code) with one line that passes unremarked, and one line that triggers the bug:
flee flex
When I run the code, I see:
--flee-- --flex-- Wide character (U+FFFD) in pattern match (m//) at ./fmin line 14, <FIL +E> line 2. ends with x and optional y or z
The reported U+FFFD, of course, appears nowhere in the perl code or the input file, so I don't know where it's coming from, hence why I'm pretty sure it's a perl bug rather than something I'm doing wrong. Any insight appreciated!

Replies are listed 'Best First'.
Re: erroneous warning involving locale and input encoding: perl bug?
by Corion (Pope) on Apr 17, 2017 at 20:50 UTC

    Thank you for this very detailed and yet to the point report!

    Unfortunately, I cannot reproduce the issue you see with:

    c:\Users\Corion\Projekte>perl -v This is perl 5, version 20, subversion 1 (v5.20.1) built for MSWin32-x +64-multi-thread Copyright 1987-2014, Larry Wall

    Also, I had to remove the following line, as it seems that my version of locale doesn't know about :ctype:

    use locale ':ctype';

    The Linux 5.18 I have, doesn't want to know about experimental, so I can't meaningfully test it there either. If I remove the mention of "smartmatch", no warning happens, but that just matches your description.

    Update: After installing experimental, I have no change in my installation of 5.18. But it also doesN#t want to know about use locale ':ctype', so I might still be missing something there.

Re: erroneous warning involving locale and input encoding: perl bug?
by Anonymous Monk on Apr 17, 2017 at 21:02 UTC
    U+FFFD is the substitution character (this one: �). Probably generated by perl when it tried to decode your string and failed.

    What's the output of locale -a? (your locale seems a bit wrong to me...)

      My code is designed to not care how the system locale is set: it explicitly says it will use only the ctype category, and then explicitly sets that category to ISO-8859-1, so that the locale settings of the shell are overruled.

      What do you think might be wrong with my setlocale() call? locale -a outputs

      C POSIX en_US en_US.iso88591 en_US.utf8
      The warning still appears even if I give setlocale() its lowest-common-denominator setting, 'C', instead of 'en_US.iso88591'.
        What do you think might be wrong with my setlocale() call?
        I see now that it should be ok.

        I don't think that your code should trigger any situation where perl could legimitely generate replacement character (if the input is as you say). Must be a bug.

      Probably generated by perl when it tried to decode your string and failed
      (although I don't see why...)
Re: erroneous warning involving locale and input encoding: perl bug?
by Anonymous Monk on Apr 17, 2017 at 20:49 UTC
    consider:
    print "ends with x and optional y or z\n" if /x(y|z)?\n$/;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1188156]
Approved by Paladin
Front-paged by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2017-08-17 06:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Who is your favorite scientist and why?



























    Results (282 votes). Check out past polls.

    Notices?