http://www.perlmonks.org?node_id=631479

brian_d_foy has asked for the wisdom of the Perl Monks concerning the following question:

I know someone out there has already done this and is just hiding it somewhere in CPAN or on their local disk. I'm willing to do it myself if I must, but that will take a long time. Java, C#, and VB apparently already have this. I want it in Perl. I'm willing to put up a Stonehenge Rock Star grant for someone who can deliver. Heck, maybe I should make this an X-Perl Prize :) (And for Randal, doesn't this sound like a really, really cool column idea? Can you whip this up in McMenamin's tomorrow night? :)

I'm doing a lot of geocoding stuff right now, and taking data from various places so it eventually winds up in a GPX file. I'm not starting with XML, but I'm ending up there. Before I get to the XML, I want to validate the values according to the XSD (in the case of GPX, that's http://www.topografix.com/GPX/1/1/gpx.xsd before I put them into the Perl data structure, but I don't want to work too hard doing it.

So, in the stuff I'm working on for Geo::Gpx, I want to have a bit that validates values, but without pulling in a boat load of XML modules to parse the XSD every time. I'd really like to have a code generation tool that takes the XSD and outputs a module with the right methods ready-to-go. That way, the mere user of Geo::Gpx isn't stuck in dependency hell for something that doesn't need to be dynamic and that I can generate ahead of time and isn't directly related to the task of creating the GPX format. I would generate the module as the developer and simply distribute the result.

The toolchain starts with (and I'd be satisfied with):

$ xsd2pm foo.xsd > Foo.pm

Foo.pm should be completely self-contained and without dependencies, and contain all the methods I need to validate the data that will end up as the values in the XML. Once I have that tool, it's easy to automatically generate a separate distro if I wanted:

$ xsd2module foo.xsd Creating Perl distribution for Foo... ... ...

Of course, I can do this by hand. GPX isn't that big and isn't that hard. Indeed, I've done it by hand already. My code doesn't really care, because all that stuff hides behind an interface. Maybe I'll have to rename some functions, but that's not hard.

I looked at Sam Tregar's XML::Schema::Validator. It's a bit old and has a lot of fail reports, but down in the guts somewhere I think it has most of the pieces. It knows about the basic data types, so those methods are there, and it has a way to derive types. There might be some useful stuff in SOAP::WSDL. The trick is dumping just the parts I need into a new module, including the derived types special to the XSD. I didn't find anything else though.

However, after I get done with the GPX stuff, I have other formats I have to generate, and those get trickier. I don't want to keep doing this by hand.

So, pretty please with sugar on top, tell me someone has already done this. :)

Update: Moron, you missed the entire point about dependencies. I don't want to have to create tens of thousands of XML files just so I can use an XML parser to see if a value is a number and within range. The point is that there isn't a generic module to do this in Perl. I'm not looking for a hack, and I'm not having trouble. I'm looking for the work that somebody has already done before I do it myself for the general case.

Update: here's a sample. In the linked XSD, there's a user-defined type called longitudeType:

<xsd:simpleType name="longitudeType"> <xsd:annotation> <xsd:documentation> The longitude of the point. Decimal degrees, WGS84 datum. </xsd:documentation> </xsd:annotation> <xsd:restriction base="xsd:decimal"> <xsd:minInclusive value="-180.0"/> <xsd:maxExclusive value="180.0"/> </xsd:restriction> </xsd:simpleType>

By hand, I turned that into a method that returns true if the scalar I pass to it fits that description:

sub longitudeType { &_non_null and &xsd_decimal and $_[0]->_between( $_[1], -180, 180 + ) }
--
brian d foy <brian@stonehenge.com>
Subscribe to The Perl Review

Replies are listed 'Best First'.
Re: Automatically creating data validation module from XSD
by agianni (Hermit) on Aug 09, 2007 at 14:52 UTC
    Have you looked at XML::LibXML::Schema? I haven't used it but it looks like it might do exactly what you're looking for. That is to say that it doesn't build validation modules, but purports to validate XML against XSD and appears to have been much more recently updated than XML::Validator::Schema.
    perl -e 'split//,q{john hurl, pest caretaker}and(map{print @_[$_]}(joi +n(q{},map{sprintf(qq{%010u},$_)}(2**2*307*4993,5*101*641*5261,7*59*79 +*36997,13*17*71*45131,3**2*67*89*167*181))=~/\d{2}/g));'
Re: Automatically creating data validation module from XSD
by nferraz (Monk) on Aug 09, 2007 at 13:26 UTC
    Can you post the code you created by hand, so we can compare it with the xsd source? (I have created some xml-to-perl code generators in the past, but since they don't solve this exact problem, it would be useful to see how the target should be -- even if incomplete.)
Re: Automatically creating data validation module from XSD
by bart (Canon) on Aug 09, 2007 at 19:33 UTC
    My first thought would be to try and tackle this with XSLT. As XSD is XML, and you're trying to convert XSD/XML to another form (source code, which is just plain text), this sounds like a job for a templating system that can process XML as input – even though I've never used XSLT to convert XML to plain text.

    I feel like taking a stab at it, but just a dry spec with no examples is much too dry for me. So I concur with nferraz: can you offer us a real world XSD file and the Perl module you created from it by hand? That'll give me something to aim for.

    If it works, it can still be used as a working example to rework it into a Perl script, or module.

      I've made a first test XSL file:
      <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsl:output method="text"/> <xsl:template match="xsd:simpleType"> sub <xsl:value-of select="@name" /> { <xsl:for-each select="xsd:annotation/xsd:documentation"># <xsl:val +ue-of select="normalize-space(.)" /></xsl:for-each> my($class, $value) = @_; return FALSE if _is_null($value); return <xsl:for-each select="xsd:restriction"><xsl:value-of select +="replace(@base,':', '_')" />($value) and <xsl:apply-templates mode=" +restriction" select="*"/>TRUE;</xsl:for-each> } </xsl:template> <xsl:template mode="restriction" match="xsd:minInclusive">$value &gt;= + <xsl:value-of select="@value" /> and </xsl:template> <xsl:template mode="restriction" match="xsd:minExclusive">$value &gt; +<xsl:value-of select="@value" /> and </xsl:template> <xsl:template mode="restriction" match="xsd:maxInclusive">$value &lt;= + <xsl:value-of select="@value" /> and </xsl:template> <xsl:template mode="restriction" match="xsd:maxExclusive">$value &lt; +<xsl:value-of select="@value" /> and </xsl:template> </xsl:stylesheet>
      It converts the sample from brian's root node, after I wrapped in it an "xsd:schema" top level element, just as in the XSD file he linked to, using Saxon8, into:
      sub longitudeType { # The longitude of the point. Decimal degrees, WGS84 datum. my($class, $value) = @_; return FALSE if _is_null($value); return xsd_decimal($value) and $value >= -180.0 and $value < 180.0 + and TRUE; }
      What do you think, brian? Is this close?

      It doesn't work in XML Notepad, because of the replace (which replaces the colon with an underscore). Without it, it works in MS XML Notepad, too — except for the missing substitution, of course.

      For kicks, I've processed the original XSD file this way, and (apart from some junk from those element that are now not handled in the XSD file) I get this:

      This is fun.

      p.s. I used TRUE and FALSE as booleans for readability. You can always replace them with 1 and 0, but I would prefer constants.

      Update brian asked how hard it is to extend to process other types too, in particular, fixType (enumeration). That turned out to be an addition of a few extra lines. I've also done a few extra modifications so it puts a package declaration at the top, a "1;" at the bottom, and suppression of the junk. The result is here:

        It doesn't work in XML Notepad, because of the replace

        MS has only poor support for EXSLT which also is the case for its string extensions of XPath 1.0 (replace).

        Standard XPath 1.0 offers translate (somewhat like tr in Perl). For your purpose just do a s/replace/translate/ and even limited renderer like XML Notepad are satisfied. (tested)

Re: Automatically creating data validation module from XSD
by Zaxo (Archbishop) on Aug 10, 2007 at 04:19 UTC

    I think Tie::Constrained will do what you want. It will let you tie a scalar from your structure to some particular condition through a boolean coderef.

    An attempt to assign an invalid value (meaning it fails the condition) invokes a failure function which you can define for each type.

    Inheritance is the preferred way to specialize Tie::Constrained to a type.

    I've discussed and demonstrated Tie::Constrained in several pm articles: To Validate Data In Lvalue Subs, Tie Me Up, Tie Me Down, Re^2: Writing general code: real world example - and doubts! (reply to you!), and in the my Dog $spot; thread.

    Part of the problem of combining attributes automatically under Tie::Constrained can be handled by the trick I showed in FunkOpera: Abstracting Perl Operators (where did I get that silly name? . . .). The skeletal FunkOpera.pm shown there overloads operators to combine coderefs to return a coderef which combines their returns in the suitable way.

    Sorry, I've never done anything significant with xml, so I don't really understand what I can take as given in your problem. I think the base types would need to be hand encoded, but user-defined ones might be automated through some XML magic and an extension of the unfortunately named FunkOpera.pm.

    After Compline,
    Zaxo

Re: Automatically creating data validation module from XSD
by Moron (Curate) on Aug 09, 2007 at 09:18 UTC
    I wouldn't do it. If you're having trouble writing code that validates from a .xsd, what about reversing the input into xsd form - seems an illogical thing to do given that actual data should not be a schema, but the effect is to reduce the validation requirement to a single fixed generic routine of about half a page to compare the two xsd trees.
    __________________________________________________________________________________

    ^M Free your mind!