Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma? ("XS")

by Anonymous Monk
on Oct 02, 2011 at 07:49 UTC ( #929117=note: print w/ replies, xml ) Need Help??


in reply to Re: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma? ("XS")
in thread Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

You expect an XS module opening a file ..

But it isn't opening a file, its reading from a filehandle, sure ARGV its magic, but CSV_XS isn't doing the opening


Comment on Re^2: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma? ("XS")
Re^3: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma? ("lexical")
by tye (Cardinal) on Oct 02, 2011 at 08:23 UTC

    If Tux is correct and this has been "fixed", then I think the documentation for open.pm should be corrected. I certainly don't see how the offered code qualifies for:

    "Any two-argument open(), readpipe() (aka qx//) and similar operators found within the lexical scope of this pragma"

    I haven't dived into the guts (well, I have dived into guts related to open.pm but not recently and not in relation to this specific case), but it appears that the only thing within the lexical scope of the pragma is the passing of a file handle to an XS module. That XS module reads from the handle and the reading from the handle triggers "magic" (as you put it) that causes a file to be opened.

    The opening is not done by code within the lexical scope of the pragma. Perhaps the documentation should say that it impacts 'open' within the temporal scope of the pragma? I doubt it actually does that, though (that wouldn't match my memory of the guts the last time I dived into them).

    But if it isn't temporal scope, then I'm hard pressed to explain how it could actually work in this case. Perhaps somebody will explain it. I don't plan to spend time investigating this particular mystery.

    I doubt the original poster's expressed desire for ignorance will lead to success when dealing with UTF-8 streams. Unfortunately, UTF-8 was defined in a way and supported by Unix (and Perl) in ways that make handling it correctly very often require significant diving into a lot of details.

    - tye        

      I don't plan to spend time investigating this particular mystery.

      If you're someone who has a deep understanding of how the Perl programming language works and the know-how to help fix it when it's broken—and I suspect you are—then perhaps you should spend time investigating this particular mystery. If you help make Perl more intuitive to use ("DWIM"), then you improve the language, which benefits the Perl community.

      Posting snarky, condescending responses to the earnest inquiries of causal Perl programmers on PerlMonks doesn't improve the language or help the Perl community, and so isn't the best use of a Perl expert's time. It's especially unhelpful if the obtuse point one is trying to make turns out to be wrong.

      I doubt the original poster's expressed desire for ignorance will lead to success when dealing with UTF-8 streams. Unfortunately, UTF-8 was defined in a way and supported by Unix (and Perl) in ways that make handling it correctly very often require significant diving into a lot of details.

      You're right that grappling with Unicode in Perl is too often unduly tricky and obscure. But in most cases, as in this case, simple, ordinary tasks should be more straightforward. After all, "Easy things should be easy and hard things should be possible." Reading and writing trivial CSV records encoded in UTF-8 is most assuredly an "easy thing," not a "hard thing," isn't it?

      (I'm the original poster, and I'm using Windows, not Unix. I made this clear in my original post. Also, UTF-8 is an ingenious encoding scheme that accomplishes its multiple objectives brilliantly. It wasn't defined in a way such that handling it correctly by programmers using modern programming languages and software libraries must inevitably be more difficult than handling text in any other character encoding by those same programmers. You can't blame Unicode here.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929117]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2014-08-30 11:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls