Agreed. As long as no UTF-16 chars are used, which is really easy to do in java since strings are all UTF-16, it's all gravy.
Java strings are UTF-16
A good XML writer that prevents you from going outside of the declared format will protect you from future mistakes. Down side is the extra effort in using some XML api over others. JSON is usually gravy in all languages.
Gravy.. mmmm... | [reply] [Watch: Dir/Any] |
Java strings are UTF-16
The way strings are stored internally doesn't really matter.
While Perl stores unicode strings internally as UTF-8 (or something very close to it), it can encode those strings to many other encodings for output. The same holds for Java: while it stores strings internally as UTF-16, there's no problem creating UTF-8 output, for example.
Writer utf8out = new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("outfile"), "UTF-8"
)
);
utf8out.write("some unicode data");
| [reply] [Watch: Dir/Any] [d/l] |
That's really really bad advice. You can get encoding errors like that on the java side. Yeah, the bytes in memory are all that matter and you're fine to interpret, but on the way in and out, you're playing with things.
It's the same thing like binmode. You're affecting the data as the IO occurs to get into our out of memory. See...
http://www.docjar.com/docs/api/java/nio/charset/UnmappableCharacterException.html
I've run into this exact problem using the XML feeds for perlmonks while working in java.
| [reply] [Watch: Dir/Any] |
Just an OT question ... do all these writers in writers in writers look as crazy to you as they do to me? I mean I have no experience with Java (nor do I want any), but it seems IO in Java is just as overcomplicated as in C#/.Net. There probably is a reason that I, being OO unclean, do not see, but still. Looks like someone went heavily overboard when designing the libraries.
Jenda
Enoch was right!
Enjoy the last years of Rome.
| [reply] [Watch: Dir/Any] |