|Keep It Simple, Stupid|
Data-driven Programming: fun with Perl, JSON, YAML, XML...by eyepopslikeamosquito (Chancellor)
|on Apr 19, 2015 at 08:41 UTC||Need Help??|
As part of our build and test automation, I recently wrote a short Perl script for our team to automatically build and test specified projects before checkin.
Lamentably, another team had already written a truly horrible Windows .BAT script to do just this. Since I find it intolerable to maintain code in a language lacking subroutines, local variables, and data structures, I naturally started by re-writing it in Perl.
Focusing on data rather than code, it seemed natural to start by defining a table of properties describing what I wanted the script to do. Here is a cut-down version of the data structure I came up with:
Generally, I enjoy using property tables like this in Perl. I find them easy to understand, maintain and extend. Plus, a la Pike above, focusing on the data first usually makes the coding a snap.
Basically, the program runs a specified series of "actions" (either commands or functions) in the order specified by the action table. In the normal case, all actions in the table are run. Command line arguments can further be added to specify which parts of the table you want to run. For convenience, I added a -D (dry run) option to simply print the action table, with indexes listed, and a -i option to allow a specific range of action table indices to be run. A number of further command line options were added over time as we needed them.
Initially, I started with just commands (returning zero on success, non-zero on failure). Later "action functions" were added (again returning zero on success and non-zero on failure).
As the table grew over time, it became tedious and error-prone to copy and paste table entries. For example, if there are four different directories to be built, rather than having four entries in the action table that are identical except for the directory name, I wrote a function that took a list of directories and returned an action table. None of this was planned, the script just evolved naturally over time.
Now is time to take stock, hence this meditation.
Coincidentally, around the same time as I wrote my little script, we inherited an elaborate testing framework that specified tests via XML files. To give you a feel for these, here is a short excerpt:
Now, while I personally detest using XML for these sorts of files, I felt the intent was good, namely to clearly separate the code from the data, thus allowing non-programmers to add new tests.
Seeing all that XML at first made me feel disgusted ... then uneasy because my action table was embedded in the script rather than more cleanly represented as data in a separate file.
To allow my script to be used by other teams, and by non-programmers, I need to make it easier to specify different action tables without touching the code. So I seek your advice on how to proceed:
Another concern is that when you have thousands of actions, or thousands of tests, a lot of repetition creeps into the data files. Now dealing with repetition (DRY) in a programming language is trivial -- just use a function or a variable, say -- but what is the best way of dealing with unwanted repetition in XML, JSON and YAML data files? Suggestions welcome.