In this node I said that for large config files (a condition of the OP) that I preferred XML to many other mechanisms including oodles of name=value pairs. Later on in the thread, in response to comments about the possibilities of misconfiguration with XML files, I suggested using a well-written DTD.

I suggested this because I believe that if a DTD is well-written, it can be validated by an external program and the config file can be validated against the DTD by an external program. They can also be validated by the app itself, but that may be undesirable. Some issues might be: long startup time to discover a bad config file, a poorly written app that does not check all of the config file before using it and dies into the running when errors are encountered.

For me, though, I like the idea of being able to tell a running program (e.g. via a button or signal) to re-read a config file to cause changes to occur. It is nice if the file is externally validated before it is reread. Not that everyone has such a need, but in the development phase, it is sometimes nice. Also, I like being able to have an external description of the config file (a DTD) that I can keep in one place on a network that many apps/validators can share. It sure makes maintenance easy. (Yes, one could do that with other formats, but it is "built in" to the whole DTD concept.) I also like the idea of the whole config file being sucked into the app as a large data structure I can access "easily".

Anyway, I would like to know what others think about configuration files.

Replies are listed 'Best First'.
Re: Config Files Redux
by brian_d_foy (Abbot) on Nov 23, 2006 at 20:35 UTC

    In my experience, the things I would like to do as a programmer because they make sense to me cause the most problems for the non-programmers who have to use the stuff.

    Some sort of config checker or lint is good, but XML isn't really going to do that for you. You want to check not only the format, but the meaning of the actual values (e.g. does that file really exist, can you send mail to that email address, etc). A DTD doesn't really buy you that much once you already have code to read the configuration and test it.

    If techies will administer the app, then you might do just fine with an XML (or XML like) format. However, every time I think that, the maintenance task gets pawned off to some non-techie who just wants to make it through the day without breaking anything. :)

    There's really no rule that fits everyone's situation, but I've found that the non-technical considerations to be more important, even when I didn't like it. :)

    brian d foy <>
    Subscribe to The Perl Review

      Just to add to this statement - our check-in process was recently modified to include a form to fill out with evidence that all proper processes have been followed. The form is just a text area field with pre-populated XML in it that we're supposed to fill in. Then the server validates each field, and, if anything isn't filled out appropriately, it spits back "invalid" for each field that's wrong.

      I can tell you that this is seriously annoying.

      And our product has a key selling feature that involves XML. And, obviously, nearly everyone using it are developers.

      Although, on the other hand, it may make things a bit easier when I attempt to bypass the whole thing by writing a perl client that fills all the fields in for me, using LWP.

      (I know this is straying dangerously off the original topic, but the lesson is: if you expect frequent modification, simple-to-edit becomes so much more important. And XML ain't it.)

Re: Config Files Redux
by GrandFather (Saint) on Nov 23, 2006 at 20:01 UTC

    Validation of the config file should only be a problem if it is not machine generated. If you are worried about validation then you are implying hand editing, and hand editing a large XML document is no fun at all! XML is, IMO, not suitable as a configuration file format intended for hand editing. Conventional .ini type files are much better in that regard and are generally pretty robust for parsing because their syntax is simple.

    The underlying file format is completely independent of whether a file can be reloaded or loaded into a data structure. Many of the CPAN configuration modules do exactly that with a plethora of different configuration file types.

    For persisting state where an application writes and reads the configuration files used XML can be usefull because it provides a somewhat human readable structure and there are standard tools around for viewing XML in a structured form. For exchanging information between different applications and using a DTD for mediation XML is very good. For large lumps of human editable configuration there are better solutions.

    DWIM is Perl's answer to Gödel
On good hyperlinking (was Re: Config Files Redux)
by merlyn (Sage) on Nov 23, 2006 at 18:44 UTC
    In this node I said that for large config files (a condition of the OP) that I preferred XML to many other mechanisms including oodles of name=value pairs.
    No, it's not in "this node". It was in "a previous node".

    The point of hyperlinking is to start with text that makes sense even without any hyperlinks, and then add links so that parts of the text magically take you to some place related to that text.

    Misunderstanding this principle leads us to write "click here" style links, which mean nothing and actually detracts. Don't say "For more information, click here". Instead, write "For more information, see the detailed description". Not only will that read correctly in absence of a web browser (such as when printed), it even makes a somewhat self-documenting book mark: "the detailed description" vs "click here".

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

    update: Apparently 456bereastreet agrees with me.
Re: Config Files Redux
by Firefly258 (Beadle) on Nov 24, 2006 at 23:33 UTC
    oodles of name=value pairs

    Just how does XML make the process any simpler? consider this
    <xml> <host1> <hostname>foo</hostname> <function>Serve intranet content</function> <transport>TCP</transport> <port>80</port> </host1> </xml>
    as opposed to an .INI
    [Host1] Hostname=Foo Function="Serve Intranet content" Transport=TCP Port=80
    I certainly see the latter as simpler, neater, easier-to-type, consuming-less-space, more legible and less error-prone, etc.

    IMO that out-of-the-box an .INI style configuration file is relatively simple to use. The way the format is inherently structured ensures good readability for humans to readily glean information from the contents. Little is likely to go wrong as there is a very neglible amount of markup to get in the way. Writing/using a parser for the format is quite easy and handling exceptions is also straight-forward, there aren't too many exceptions.

    XML on the otherhand allows for structured information like nested or heirarchical configuration data. It is well portable and it's a standard, so the data is readily avaialble for any program to use. However, since XML is structured and is dependant on well-formedness, a single mis-punctuation or missing tag can lead to XML being not so forgiving. That coupled with the fact that XML uses a lot more markup (which can and usually does get in the way) doesn't readily produce neat and legible information, atleast from a human perspective.

    You can argue that XML might be made to work by neatly laying out the information as per a DTD or by using an XML aware editor, but IMO that writing a DTD is extra work just to get XML to work when there are better, cheaper, easier alternatives, not all editors are clever either. Anyone editing the XML must adhere to and be knowledgeable of the DTD. In cases where the XML is dependant on its meta-data (DTD/XSD/XSLT,etc) which isn't avaialble, everything falls apart.

    The .INI style config. file has been around since the beginning of computing, it's simple, legible and for the same reasons, quite popular. There are modules in almost every language to write, manage, maintain, verify, etc .INI files. I'd choose it over XML even if humans rarely had to manage it because there might come a day when a person might have manually make changes, and if that were under stress, I'd like to ensure his job were to on as smootly as possible.

    There are also humans to consider
Re: Config Files Redux
by Fengor (Pilgrim) on Nov 27, 2006 at 13:35 UTC
    For me, though, I like the idea of being able to tell a running program (e.g. via a button or signal) to re-read a config file to cause changes to occur. It is nice if the file is externally validated before it is reread.
    I just want to add that this only validates the syntax of the configuration file. Something that is not a big concern when the config file is generated as another monk pointed out. It still leaves the application viable to choke on syntactically correct although nonsense/false config statements (for example: setting a path to a file on a device that is no longer attached (usb stick for example)) or typos in the settings.

    -- Terry Pratchett, "Reaper Man"

Re: Config Files Redux
by digger (Friar) on Nov 28, 2006 at 00:42 UTC

    If the biggest concern in this whole discussion is hand-editing of the config file by "non-techies", why not have a gui for editing the config? I understand the philosophy of allowing people to easily edit configs, but if we want to allow non technical users to make changes, maybe allowing them to directly edit a file by hand isn't the greatest idea.

    Limiting the modification of the config file to methods provided by the developer eliminates the issues of validating the config at load time, and reduces support issues in the long run. I store my config data for Perl and C# apps in XML files, and have had no problems with the files being corrupted by an end user. If the user mucks around with the config files on their own, without making a backup first, and causes problems with the application, that is their own fault. They can recreate the configuration, or pay for my time to fix the file by hand.

    just my 2 cents.
Re: Config Files Redux
by Anonymous Monk on Dec 04, 2006 at 01:44 UTC
    Some thoughts on this: -Using XSD instead of DTD makes actual useful validation of XML data a realistic possibility. You can have types, enumeration, ranges, and other matching mechanisms to make sure the config file's values are viable before you get into any detail. -No config mechanism can do full input checking without being a quite specific language (how should it pre-validate an email address or url? just try it? that's ridiculous). The best you can do is hope that it eliminates most of the cruft up front. -For user editable data, wasn't YAML supposed to do that? ymmv, of course... :)
Re: Config Files Redux
by traveler (Parson) on Nov 23, 2006 at 22:02 UTC
    Thanks for the comments so far. I have discovered a few things: 1) I really don't expect anyone except developers and maintainers to edit configuration files directly; I expect some program to manage "user" changes. 2) If I have a large file, I prefer XML with a DTD because it can help prevent size=blue or color=large type mistakes; that's good even if machine-generated in error. 3) Most of us agree that XML should be edited by good tools (presumably not just vim...).
      I prefer XML with a DTD because it can help prevent size=blue or color=large type mistakes...

      You've said this multiple times, but you've never explained how.

      The how is the same way you would do it for any other type of file. Someone writes a comprehensive set of rules by which to validate the configuration file. The file format doesn't matter. The format of the rules doesn't matter. Something reads and processes the configuration file and the rules and applies the rules to the processed file.

      You have to do this anyway to parse the configuration file in your application.

      Now you may have better tools to do this with an XML application than with any other format, but then your real argument for using XML is better tools, not validation.

        And tools like this exist for formats like YAML; for example you can use kwalify to produce a DTD-like schema for YAML documents.

        If the DTD is a tool, then yes, the issue is better tools. The way to do it in a DTD is:
        <!ATLIST box color (red|blue|green) blue>
        This allows
        <box color=red />
        but not
        <box color=large />
        The default is blue in this example.