comment on

Hello, wise monks. I believe the code below demonstrates a perl bug, but before reporting it as such, I'd like to run it by the perl cognoscenti, and make sure I'm not doing something foolish. I am running perl v5.22.2 on i686-linux.

The bug is a warning perl emits that seems completely inapplicable, A precise interaction of a number of components seems to trigger it; I've not found a way to further pare down the code snippet below and still trigger the bug. In particular:

it only happens when input comes from a file; I cannot reproduce it by redirecting stdin, using a DATA block, or any other of the usual means of crafting an example that doesn't rely on external files
the input-file encoding must be specified as iso-8859-1, even if the input file contains only the ASCII subset of this character set
smartmatch must be activated, even though this code snippet doesn't use it
the regular expression is not the simplest way to express this particular match, but all its components seem necessary for the bug to show up

Here is my code:

#!/usr/bin/perl

use experimental 'smartmatch';
use open ':encoding(iso-8859-1)';
use POSIX 'locale_h';
use locale ':ctype';
setlocale(LC_CTYPE, 'en_US.iso88591');

open (FILE, '< s2') || die "Cannot open\n";

while (<FILE>) {
  chomp;
  print "--$_--\n";
  print "ends with x and optional y or z\n" if /x(y|z)?$/;
}
close (FILE);
[download]

and here is a sample input file (filename "s2" hard-coded in the the perl code) with one line that passes unremarked, and one line that triggers the bug:

flee
flex
[download]

When I run the code, I see:

--flee--
--flex--
Wide character (U+FFFD) in pattern match (m//) at ./fmin line 14, <FIL
+E> line 2.
ends with x and optional y or z
[download]

The reported U+FFFD, of course, appears nowhere in the perl code or the input file, so I don't know where it's coming from, hence why I'm pretty sure it's a perl bug rather than something I'm doing wrong. Any insight appreciated!

In reply to erroneous warning involving locale and input encoding: perl bug? by raygun

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks