http://www.perlmonks.org?node_id=283551

GermanHerman has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that frequently exchanges information with java recntly we have found that some of the content breaks our markup medium XML. Can any one with java experience please advise me as to how I can encode something on my end so it is easily decodable on the java end?
-Douglas
##############
So much life
So little time
##############

Replies are listed 'Best First'.
Re: Data Exchange between perl and java.
by Molt (Chaplain) on Aug 13, 2003 at 14:08 UTC

    I deal a fair bit with Java coders here, and so end up throwing vast amounts of data round to them in XML form. The best way of doing this seems to simply be using a good compliant module for XML production, my personal recommendation based on experience being the XML::LibXML module, although this does require the libxml c library to function so getting it installed can be a challenge with strict sysadmins.

Re: Data Exchange between perl and java.
by wirrwarr (Monk) on Aug 13, 2003 at 15:58 UTC
    If the content is normal HTML (and not XHTML), then it will most definitely produce broken XML. E.g. the <br> tag has no closing tag in HTML; this will produce broken XML. The solution is to use the CDATA sections as explained by some other poster above.

    If you use CDATA sections, then you can also validate your XML files against a DTD or schema, otherwise you would have to describe the complete (X)HTML syntax in your DTD/schema.

Re: Data Exchange between perl and java.
by BrowserUk (Patriarch) on Aug 13, 2003 at 14:14 UTC

    There really isn't enough information in your question to even begin to address your question.

    In theory at least, it should be possible to encode anything using XML, through the expedient of CDATA sections. You should probably go to XML.com and read their wisdom.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      Much of the data that is going into that xml is HTML. I am worried that some of that data may include xml at some point and therefore break it's enclosing xml.

      I guess that I don't have to worry about this if I am using a module like XML::Simple?

      -Douglas

        In breif, anywhere you can put 'content', you can put a CDATA section to contain that content

        <![CDATA[ anything can go here ]]>

        If you always wrap your HTML content in a CDATA section, then you should never encounter the situation where the HTML interferes with the XML, provided you are using a properly XML complient parser.

        I guess that I don't have to worry about this if I am using a module like XML::Simple?

        I'm not quite sure what you mean by this?

        If you mean when parsing XML?

        if any XML/HTML markup in the content of the XML being parsed was wrapped in CDATA sections, then I'm fairly confident that XML::Simple would handle it, but that is a big if.

        If you mean when constructing XML?

        I doubt that XML::Simple will handle applying the CDATA wrapping for you. That would be very much down to you to do regardless of what you use to construct your XML.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Data Exchange between perl and java.
by dws (Chancellor) on Aug 13, 2003 at 15:47 UTC
    ... recently we have found that some of the content breaks our markup medium XML.

    Without knowing what this "some of the content" is, we can only speculate. Can you post a fragment or two that demonstrates the problems you're seeing?

      I am sorry that I didn't post some sample content initially, the content will include user input (html, javascrip, XHTML), so I would like it to be able to include ANYTHING, that's why I wanted to put it in some sort of safe compartment while it was in there with the XML.

      The question is : How could I saftly escape everything dangerous to XML and have my java developer easily be able to unescape it?

      -Douglas
Re: Data Exchange between perl and java.
by rje (Deacon) on Aug 13, 2003 at 16:58 UTC
    This is seriously off-topic and only peripherally Perl related, but I don't know where to ask these kinds of questions, so flame away if you like.

    Has anyone used something lightweight like YAML or LISP to pass content between apps?

    Just curious. Maybe some kind soul can point me to the right place to ask that kind of question?

    Thanks for bearing with me.
    rje

      I did try using YAML to pass data in HTTP headers from a (public-facing) server running mod_perl to a back-end server running mod_ruby. Since the data was pretty simple (a Hash), and therefore also a primitive type in both Perl and Ruby, YAML seemed like a good fit. It worked almostly completely smoothly; I seem to remember finding some surprises with empty elements, perhaps. I can't quite remember (not at the same PC today). Anyway, the code was amazingly concise compared to the stuff I write to use XML as a Perl-Java bridge.

      In this area, YAML has got some things going in its favour: a fast C-implementation (Syck) that works with PHP, Python and Ruby (it's in the 1.8 core); potentially less code; and less "stuff" to shunt about. I can see it being useful in internal applications, where the ease of cross-platform object sharing is a win; I can't see it taking off in public interfaces, where the preponderance of the (often Java-backed) XML apps is so great. Plus, bear in mind that they're not really the same thing; XML is a far more general-purpose tool than YAML, which is (just) a 'data serialization language.

      cheers
      ViceRaid

      I've used Data::Javascript to serialize data and pass it between a pure perl screen scraping application and an IIS/JScript (ASP) web page.

      You just have a couple lines of code to serialize/deserialize the data. Better yet, the dependency libraries required are less than any comparable method, because you're using the language itself as the serialization format (like Data::Dumper).

      eval'ing foreign code is a big security hole, but I figured that simply including a passphrase in the payload and using https was good enough for an intranet apps.

Re: Data Exchange between perl and java.
by GermanHerman (Sexton) on Aug 13, 2003 at 15:03 UTC
    Perfect, thanks so much for helping despite my posting incompetence,
    -Douglas
Re: Data Exchange between perl and java.
by zakzebrowski (Curate) on Aug 13, 2003 at 17:12 UTC
    Easy,
    rm -rf /usr/bin/java/*
    PS. Joke.

    ----
    Zak
Re: Data Exchange between perl and java.
by inman (Curate) on Aug 15, 2003 at 13:13 UTC
    I did some work using XML::Writer to create XML docs from fielded text file. I noticed that it only escaped four or five 'special' chars. e.g. '&' to &amp;.

    This was OK until I started encountering some data which was outside the 7bit ascii range that was not escaped and throwing some XML parsing tools. I extended the code to do a general escape of characters outside the 7bit range.

    You could use a similar approach if you find that unescaped chars are causing a problem.

    sub XML::Writer::_escapeLiteral { # escape 'normal' characters e.g. ampersand if ($_[0] =~ /[<>&"']/) { $_[0] =~ s/\&/&amp;/g; # ampersand $_[0] =~ s/\"/&quot;/g; # quotes $_[0] =~ s/\</&lt;/g; # left angle $_[0] =~ s/\>/&gt;/g; # right angle $_[0] =~ s/\'/&apos;/g; # apostrophe # Add more here... } # Look for occurrances of chars in the range outside normal 7 bit +ascii table # Note the character in Octal if ($_[0] =~ /([\177-\377])/) { my $escapedString = ""; # Generic escape for all other chars while ($_[0] =~ /\G(.*?)([\177-\377])/gc) { # concat the escaped char in the form &#123; $escapedString .= sprintf ("%s&#%d;", $1, ord($2)); #print "escapedstring = $1 \n"; } # Concat whatever is left if ($_[0] =~ /\G(.*)$/g) { $escapedString .= $1; } return $escapedString; } return $_[0]; }

    Enjoy!
    Inman

      Perfect!! Thank you so much Inman that is exactly what I
      needed. You get TWO gold stars (and a vote to boot)
      -Douglas