Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: passing data structures from java to perl

by almut (Canon)
on Jul 28, 2010 at 21:05 UTC ( #851806=note: print w/ replies, xml ) Need Help??


in reply to Re^3: passing data structures from java to perl
in thread passing data structures from java to perl

Java strings are UTF-16

The way strings are stored internally doesn't really matter.

While Perl stores unicode strings internally as UTF-8 (or something very close to it), it can encode those strings to many other encodings for output.  The same holds for Java: while it stores strings internally as UTF-16, there's no problem creating UTF-8 output, for example.

Writer utf8out = new BufferedWriter( new OutputStreamWriter( new FileOutputStream("outfile"), "UTF-8" ) ); utf8out.write("some unicode data");


Comment on Re^4: passing data structures from java to perl
Download Code
Re^5: passing data structures from java to perl
by exussum0 (Vicar) on Jul 28, 2010 at 23:25 UTC
    That's really really bad advice. You can get encoding errors like that on the java side. Yeah, the bytes in memory are all that matter and you're fine to interpret, but on the way in and out, you're playing with things.

    It's the same thing like binmode. You're affecting the data as the IO occurs to get into our out of memory. See...

    http://www.docjar.com/docs/api/java/nio/charset/UnmappableCharacterException.html

    I've run into this exact problem using the XML feeds for perlmonks while working in java.

      Not sure what you're talking about.  UTF-8 is a variable-width (multi-byte-if-needed) encoding that can encode the full unicode character set, so why should there be encoding errors, or an UnmappableCharacterException?  An UnmappableCharacterException is thrown if a certain character can't be represented in the specified target encoding, but as all unicode characters can be encoded in UTF-8, this exception cannot occur.

      UnmappableCharacterExceptions may happen if you try to encode unicode data to Latin-1, for example, but not with UTF-8.

        One instance of using the Java API incorrectly is probably fine. Sure, there's lots of clever things you can do such as insert the same 2 items into a HashSet and get the same result every single time if you iterate over them and expect an order. (You're achieving the same state)

        You can serialize a singleton, and deserialize it to get 2 copies.

        I can also do 86400 seconds in a day, which is innacurate twice a year, but most of the time it's fine.

        Sure, in perl I can do... new Foo instead of Foo->new.

        Yeah, in the end it all may work out. In the IO case, java is validating the output as it's writing it and if someone does do Unicode->Latin1 because of that exact pattern, even if it works for this instance, you're teaching people a bad habit that can yield errors if they rubber stamp it all over the place.

        You seem convinced to do this anyhow in your code. That's fine. Don't be surprised if other java devs don't look at it and go.. yeah, that looks wrong.

      That's really really bad advice. You can get encoding errors like that on the java side.

      How do you figure, please explain?

Re^5: passing data structures from java to perl
by Jenda (Abbot) on Jul 31, 2010 at 22:47 UTC

    Just an OT question ... do all these writers in writers in writers look as crazy to you as they do to me? I mean I have no experience with Java (nor do I want any), but it seems IO in Java is just as overcomplicated as in C#/.Net. There probably is a reason that I, being OO unclean, do not see, but still. Looks like someone went heavily overboard when designing the libraries.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      Nope, they're building blocks you can use to create any kind of api you can imagine, even perl style layers ":raw:crlf:utf-8"

        OK, building blocks. I buy that. But why the hell are we supposed to use building blocks instead of a sane API?

        Sorry for going further OT. Though ... this is not really a Java or C# specific question. It's the question of library interfaces and now with Perl6 object system I am afraid we run the risk of going too puristicaly object oriented. Probably not with file system IO, but with other libraries. Building blocks (= implementation detail) leaking to the interface.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://851806]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-08-02 04:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (54 votes), past polls