Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: How to tell if a stream is already in UTF8 mode?

by perl-diddler (Hermit)
on Jan 03, 2014 at 21:18 UTC ( #1069200=note: print w/ replies, xml ) Need Help??


in reply to Re: How to tell if a stream is already in UTF8 mode?
in thread How to tell if a stream is already in UTF8 mode?

Where does your routine get filehandles from? Why they are opened in different modes?

It's a lower-level library formatting routine. Think of asking in "printf FH,...", "where does printf get its file handles from? Why would printf get FH's opened in different modes?"

It gets the FH from user programs with FH coming from STD(OUT,ERR) or other opened destinations. By the time printf gets it, it doesn't know if the FH was set for unicode or binary. The lower level layers 'know', and will emit a warning if they detect chars > 255 on a stream NOT marked as UTF8, AND will not encode chars between 128 - 255, as UTF8 unless the stream was previously marked as UTF8.

It doesn't sound like a good thing to make a "lower level routine" distinguish between different kinds of file handles with different IOLayers tied upon them.

The problem isn't that it is a lower-level routine, but that it isn't "low enough"... I.e. the lower-I/O layers know if the stream had binmode called on the stream.

Just guessing, now, but likely 'get_layers', may be the way, combined with a for loop to match -- matching only on the 1st char to eliminate possibilities and checking if the name (UTF-8 or utf8) is in a hash might give optimal perf-checks, then caching that as the state for that stream.

It's a one way trip -- i.e. if the routine detects > 255-valued chars in the stream, it knows the stream "needs" to be in utf8 mode, but there aren't any single-byte values that would force a reverse (since all bytes can be part of a UTF-8 encoded data stream).

Thanks for the pointer to get_layers...it's not documented on its own manpage...


Comment on Re^2: How to tell if a stream is already in UTF8 mode?
Re^3: How to tell if a stream is already in UTF8 mode?
by ikegami (Pope) on Jan 05, 2014 at 18:46 UTC

    The lower level layers 'know', and will emit a warning if they detect chars > 255 on a stream NOT marked as UTF8, AND will not encode chars between 128 - 255, as UTF8 unless the stream was previously marked as UTF8.

    The lower level always expects bytes. (Files are blocks/streams of bytes.) It will ALWAYS emit a warning if it detects chars >255.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1069200]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-12-18 03:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (41 votes), past polls