|Perl: the Markov chain saw|
Re^2: How to tell if a stream is already in UTF8 mode?by perl-diddler (Hermit)
|on Jan 03, 2014 at 21:18 UTC||Need Help??|
Where does your routine get filehandles from? Why they are opened in different modes?
It's a lower-level library formatting routine. Think of asking in "printf FH,...", "where does printf get its file handles from? Why would printf get FH's opened in different modes?"
It gets the FH from user programs with FH coming from STD(OUT,ERR) or other opened destinations. By the time printf gets it, it doesn't know if the FH was set for unicode or binary. The lower level layers 'know', and will emit a warning if they detect chars > 255 on a stream NOT marked as UTF8, AND will not encode chars between 128 - 255, as UTF8 unless the stream was previously marked as UTF8.
It doesn't sound like a good thing to make a "lower level routine" distinguish between different kinds of file handles with different IOLayers tied upon them.
The problem isn't that it is a lower-level routine, but that it isn't "low enough"... I.e. the lower-I/O layers know if the stream had binmode called on the stream.
Just guessing, now, but likely 'get_layers', may be the way, combined with a for loop to match -- matching only on the 1st char to eliminate possibilities and checking if the name (UTF-8 or utf8) is in a hash might give optimal perf-checks, then caching that as the state for that stream.
It's a one way trip -- i.e. if the routine detects > 255-valued chars in the stream, it knows the stream "needs" to be in utf8 mode, but there aren't any single-byte values that would force a reverse (since all bytes can be part of a UTF-8 encoded data stream).
Thanks for the pointer to get_layers...it's not documented on its own manpage...