Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Perl Data Structure Validation Part 2

by zerohero (Monk)
on Apr 16, 2009 at 19:48 UTC ( [id://758059]=perlquestion: print w/replies, xml ) Need Help??

zerohero has asked for the wisdom of the Perl Monks concerning the following question:

I asked a question earlier about validating perl "hierarchical data structures" and got good answers on how to do part of this. It turns out that Rx (Data::Rx) looks like a good choice (getting good error messages out seems like a feature that is currently being implemented). Kwalify looked similar and older, but the Rx site and documentation seemed a little more compelling. I'll have to try the code and find out.

There is one more aspect of data validity that validators like Rx and Kwalify do not handle. These usually relate to correlating things between structures. For example, perhaps a set of keys is dependent on a set of keys in another part of the structure.

I find for this type of checking that using an algorithmic approach combined with "something like XPath" to be the most readable and succinct. XPath lets you easily choose sets of things within a hierarchical structure. I tend to think of XPath like "regexp for trees" although this is an imperfect analogy.

Is there a perl library that does something like "XPath" for perl structures? This would let one give a "path", which would be a string expression which describes some set of nodes. It would return the nodes and then give you set operations on those. This eliminates all of the messy traversal code, collapsing it to single statements.

Replies are listed 'Best First'.
Re: Perl Data Structure Validation Part 2
by kyle (Abbot) on Apr 16, 2009 at 19:53 UTC

    This sounds a little like Data::Diver except that will let you specify only one thing at a time. If you want to use an expression that allows wildcards or regular expressions, it might be a good block to build on.

      RJBS just turned me on to Data::DPath which looks pretty good. I like the tagline: "DPath is not XPath!"

      Thanks for the tip. Data::Diver seems to do part of what is needed, but I doubt it's a platform to build on. The reason is that the technique being used is somewhat limited. A better approach is to just use a string, and parse it, for full control.

      I've seen that there are a couple of specs for technologies out there that are JSON related (JSON Schema, JSON Xpath). Once someone implements these, they'll probably be the way to go.

Re: Perl Data Structure Validation Part 2
by dHarry (Abbot) on Apr 17, 2009 at 09:51 UTC

    correlating things between structures

    Maybe Data::Validate::Structure is worth a look too. But when you want to use XPath like functionality I don't see the point of doing it in a non-XML way. Maybe you have to rethink the problem. Depending on the complexity of your data structure generating XML out of it might be straightforward (XML is hierarchical). Then you can use all the XML tooling you like. With XMLSchema you can do powerful validations (there are limitations of course). The requirement you describe: a set of keys is dependent on a set of keys in another part of the structure sounds a bit tricky but could (maybe) be handled by using key, keyref and unique constructs. It works much like the primary key/foreign key concept in a RDBMS. A small sample taken from W3C to illustrate:

    <xs:key name="fullName"> <xs:selector xpath=".//person"/> <xs:field xpath="forename"/> <xs:field xpath="surname"/> </xs:key> <xs:keyref name="personRef" refer="fullName"> <xs:selector xpath=".//personPointer"/> <xs:field xpath="@first"/> <xs:field xpath="@last"/> </xs:keyref> <xs:unique name="nearlyID"> <xs:selector xpath=".//*"/> <xs:field xpath="@id"/> </xs:unique>

    There are also other schema languages like relaxng and good old DTDs (well more old than good:).

      >> Maybe Data::Validate::Structure is worth a look too. But

      Just looked at it, thanks for the tip. Data::Validate::Structure is actually a little like a recursive descent parser. So the approach is somewhat like Data::Rx. Data::Rx seems much more thought out, as it has a spec, and is designed like a traditional recursive descent parser (so if you've seen one, you know how to extend it). Recursive descent is good for validating recursive patterns, and data that is localized to a subtree, because of the model of validation (a DFS traversal through a tree). Imagine trying to compare two parts of a tree with DFS...not very natural. This is why I mention the need for post processing using something that lets me make sets that are from dispersed regions of the tree (like Xpath, but not Xpath), and do set operations on them.

      >> when you want to use XPath like functionality I don't see the point of doing it in a non-XML way

      There are more than a few passing differences between the syntax of perl data structures and XML. The only reason to convert is if we were actually using XML, or if conversion gave us access to a particular XML tool. I'd actually turn the argument on its head. XPath, while being an XML related technology, has easy to see analogous operations in a non XML world. The underpinnings of XPath are fairly abstract, and it's more like a math. Note Data::DPath is an example of this, and at the top of the module POD the author gives about 10 basic points that indicate why you wouldn't want to use XPath to do this on perl data structures (differences and structural reasons). Also the heavyweight nature of converting everything to XML and processing - IMHO is not a good fit for internal data structure processing. Another issue is the visual density of XML. Imagine changing all of your data structures in code to look like XML - having code be readable trumps many other concerns (a reason I prefer the terseness of perl). XML, while it started out being human readable, doesn't really reach that goal in many practical cases (this is one reason why relaxNG compact schema became popular - it is 1/10th the size of an equivalent XML schema).

      I've done plenty of XML with schema checking (trang => relax ng compact, xml schemas).

      >> Maybe you have to rethink the problem. Depending on the complexity of your data structure generating XML out of it might be straightforward (XML is hierarchical). Then you can use all the XML tooling you like. With XMLSchema you can do powerful validations (there are limitations of course). The requirement you describe: a set of keys is dependent on a set of keys in another part of the structure sounds a bit tricky but could (maybe) be handled by using key, keyref and unique constructs. It works much like the primary key/foreign key concept in a RDBMS. A small sample taken from W3C to illustrate:

      That's an interesting approach. The analogy would be to do left/right outer joins and look for the nulls. However, I'd argue set operations with a little code are more powerful "setA is a subset of setB", is more compact and direct than the equivalent left outer join. Note you'd also have to start jamming all of the other SQL stuff in (unions, etc), to have even a chance of competing with set operations that something "Xpath like" gives.

      >> There are also other schema languages like relaxng and good old DTDs (well more old than good:).

      I've tried both of these on past projects with success, when applied to the correct problem (e.g. validating web service XML requests). But I'd argue this isn't the correct problem to use those technologies.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://758059]
Approved by kyle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-19 20:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found