Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^10: Speeds vs functionality (utf8 csv)

by ikegami (Pope)
on Aug 01, 2014 at 03:17 UTC ( #1095838=note: print w/ replies, xml ) Need Help??


in reply to Re^9: Speeds vs functionality (utf8 csv)
in thread Speeds vs functionality

For one, the author of Text::xSV didn't have to think about multi-byte characters.

Technically, true, but he did have to think about proving a means of providing decoded input. I don't see any.

As a result, the separator can only be in U+0000..U+007F for UTF-8 files (assuming the claim that it only supports one-character seperator is correct), and it can't handle UTF-16le files with character U+0Axx, etc.


Comment on Re^10: Speeds vs functionality (utf8 csv)
Re^11: Speeds vs functionality (fh)
by tye (Cardinal) on Aug 01, 2014 at 03:46 UTC

    Yeah, fixing the module to allow a file handle to be given instead of just a file name is quite in line with the trivial work that I noted might be required.

    Though, I suspect that Perl provides a way for declaring a default encoding for all file handles, perhaps related to "locale" settings. So I'm not even convinced that your objection is even technically correct. (Though, if Perl does not provide such a feature, perhaps you should look into providing one, IMHO. :)

    I'm actually a bit surprised that open does not already support (according to my recent scanning of the documentation):

    open my $fh, '<:encoding(UTF-8) foo.csv'

    which would have also been a route that would have worked with the unchanged Text::xSV.

    but he did have to think about proving a means of providing decoded input

    No, the author didn't have to think about that. The author just needed to allow a file handle to be given, even if the reason for allowing such had nothing to do with the author thinking about decoded input. I very often support taking a filehandle not just a filename, and very rarely is that due to me having thought about encodings.

    - tye        

      Though, I suspect that Perl provides a way for declaring a default encoding for all file handles,

      There's use open, but it's lexically scoped.

        perlrun seems to indicate that -Ci makes your original nit technically incorrect. Though, the phrase "in the current file scope" (which is not present in my copy) calls into question this implication that seems to be clearly made more than once away from that phrase (and indicates the possibility of a rather bizarre effect for that command-line switch).

        I don't find evidence of support for globally imposing a default non-UTF-8 encoding on streams. I find that a surprising lack and have seen others express similar surprise more forcefully.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1095838]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2014-12-20 10:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls