It sounds like your bug would only rear its head when $id
actually contains non-ASCII characters. The canonical method for handling this, as I understand it, is to explicitly encode incoming text streams that are potentially problematic; i.e.
my ($id, $filename) = split (/\t/, $record);
$id = encode ("UTF-8", $id);
I'd watch out for the 'filtering programmer input' trap in all this; the Perl philosophy of giving people as much rope as they like means that a properly-motivated foolish programmer can always outwit your filtering. Since you expect that $id
is printable ASCII, I'd more inclined to filter using my regex above, and re-examine the logic the introduced UTF encoding sensitivity into the code in the first place. YMMV, of course.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.