Programming using data structures

I've recently done a quick and dirty implementation of quite a big project - not a 100 line script, but a full DB aware program. It still had to be quick and dirty, though, so I had a chance to notice how I program in these circumstances.

One thing I tend to do is use data structures for programming. That is, I create functions that require complex data structures as input. I then create these complex data structures for each of my classes, which then have differing behaviours as a result.

Disadvantages of data structure programming

It gets hard to choose behaviour at runtime, because your behaviour is all defined in variables at the start of your classes. You can get around this to some extent with closures, but that tends to make your code look ugly (closure embedded in complex data structure == mess of curlies and tabs)
Danger of breaking encapsulation. I have some ugly code that looks like sort {$a->{title}{order} <=> $b->{title}{order}} @objs;
Data structure tends to grow arbitrarily as more features are added, and is more difficult to refactor because of the encapsulation issue above.

Advantages of data structure programming

The syntax rules for perl data structures are simpler than those for perl in general. In a data structure, you have only hashrefs and arrayrefs (endpoints can be something else, of course). This limitation on your language can make it easier to see what is going on, and to write correct "code".
If you set up different behaviours using a data structure, you have only one method call to debug, and one data structure. If you use e.g. polymorphism, you may have several method calls to debug.
it is usually possible to wrap an OO layer around your data structure. When calling the method that does the work, replace $class::DATA_STRUCTURE with $class->get_data.

The classic example of programming with data structures would be the database schema objects used by e.g. SPOPS, Tangram or Class::DBI. These tend to look like

$sch = {
    table => $tablename,
    fields => {
        id => { 
            type         => 'int',
            auto_inc     => 1,
            # more config values
        },
        # more fields
    },
    # behavior specifiers
};
[download]

and to my mind this works quite well. An exception is Alzabo, which makes you use method calls to create the schema; and note that they have felt it necessary to include a web-based schema builder.

Your thoughts welcome.

dave hj~

Comment on Programming using data structures Select or Download Code

Replies are listed 'Best First'.
Re: Programming using data structures by dragonchild (Archbishop) on Feb 20, 2003 at 15:07 UTC
I agree with derby. All programming is about how you organize your data. If you organize in structure A, you have to use algorithm A. If you want to use algorithm B, use structure B. A simplistic example of this is when you're figuring out what searching algorithm to use, depending on if your stuff is sorted or quasi-sorted or random or whatever. (CS 101, for those who're wondering why I even care about searching algorithms.) I have a number of ... nits ... with your Disadvantages section. (Of course, I might be missing something ...) If your behavior is all bound up in your variables, then you have a lot of switch-type statements. I'd prefer to determine what my general forms of behavior will be, then write objects for them. Why not do something like: `#### Define like: my %sort_by = ( ( map { $_ => sub { $a->{$_}{order} <=> $b->{$_}{order} } } qw(title subject author isbn)), }; #### Use like: my @titles = sort &{$sort_by{title}} @books;` [download] it is usually possible to wrap an OO layer around your data structure. There are so many things wrong with that statement I don't know where to begin. OO isn't a "layer". That is a procedural programmer talking who doesn't understand what OO is about. That would be like me, who doesn't do functional programming (and take that as you will), saying that I can just wrap a bunch of closures around my objects and it now got a "functional layer". You don't have re-use, you don't have decomposition, you don't have anything that makes OO programming ... OO. I also have a bunch of nits with your Advantages, too. Worry about the capabilities of syntax is a programmer discipline issue. Now, if that's how you enforce your discipline, that's your choice and Perl supports you. But, I hate to see someone limit themselves because they don't want to learn good discipline. If you use e.g. polymorphism, you may have several method calls to debug. Again, a procedural programmer's gripe about OO. You shouldn't be debugging more than one object at a time cause you shouldn't be working on more than one object at a time. If you make changes in too many places between tests, of course you're going to be clueless about where the bug is. If you make one change - test, one change - test ... you'll know with 75% certainty where the bug is because you just changed that piece of code. (The other 25% is if you fixed something that exposed another flaw somewhere else.) I'm sorry if I sound harsh, but, just as you have your opinion, I have mine. Your post is a good post. I just happen to disagree with it. :-) ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply] [d/l]
Re: Re: Programming using data structures by dash2 (Hermit) on Feb 20, 2003 at 17:35 UTC
Feel free to disagree, I don't object. Yes, you don't want to use switch statements. (Excessively.) I know what OO is - a little, at least. Sometimes you want to make an object based on a data structure. It isn't necessarily true that by doing this you forgo the advantages of OO. Often, it is a transitional phase. Having done something quick and dirty, you then add the OO layer. You then hollow out the quick and dirty stuff and replace it with nice OO stuff - all without breaking your code, which has been insulated by the OO layer. Syntax isn't just about discipline. It is also about making life easy for yourself. I have been programming for a couple years - but I still regularly forget the semicolon at the end of a statement. Perl's syntax is complex. Using a simple subset can sometimes be good. As for the final point... yes, of course you should change just one object at a time. And you should write unit tests for _all_ your code. Hands up everyone who does that all the time. I mentioned this was a "quick and dirty" project. dave hj~	[reply]
Re: Programming using data structures by derby (Abbot) on Feb 20, 2003 at 14:06 UTC
Data structure programming versus what? In my mind, programming = data structures + algorithms. Wether that data structure is a traditional transparent one (ala C struct or perl hashref/arrayref) or an opaque one (OO classes) doesn't really matter. Wether the data structure complex or simplistic is really a matter of the domain. I've seen some too simplistic but much more too complex. The same complexity approach one applies to algorithms can also be applied to data structures. Cohesion and coupling should be the guiding principal when conceiving both algorithms and data structures. If you do that correctly (high cohesion and low coupling), the complexity of your application should fall into a natural state. -derby	[reply]
Re: Programming using data structures by Abigail-II (Bishop) on Feb 21, 2003 at 01:37 UTC
I guess you mean programs = data structures + algorithms. But the book of Wirth talks about different data structures than Dash2. Dash2 talks about simple (although nested) structures passed into functions, while the data structures from the book are the more complex structures that keep all the data the program works on. Abigail	[reply]
Re: Programming using data structures by perrin (Chancellor) on Feb 20, 2003 at 17:50 UTC
It seems like you're mostly talking about passing complex data structures vs. passing actual objects. The tradeoffs are just as you mentioned: passing data has no encapsulation at all, while passing objects can be verbose and slower. In general, I believe that objects are the way to go. The example you chose -- schema definitions -- is actually something else in my opinion. This is configuration, and that's why it's such a natural for this kind of definition. It could easilly be done with XML or something instead.	[reply]
Re: Programming using data structures by diotalevi (Canon) on Feb 20, 2003 at 17:41 UTC
Alzabo is not so different - just because you don't create your Alzabo::Create::Schema and Alzabo::Runtime::Schema objects by laying out a series of hashes and blessings doesn't mean that they aren't actually hashes. Really now - an Alzabo::?::Schema object has some Alzabo::?::Table objects which have Alzabo::?::Column objects and other things. It /is/ a data structure - just conceptualized as some objects (really just blessed hashes) instead of plain hashes. The following snippet demonstates this nicely - Alzabo is also data (just encapsulated into objects. Same concept though) `perl -MData::Dumper -MStorable -MTie::IxHash -e 'print Dumper Storable::fd_retrieve \*STDIN' < gpmn.create.alz` Seeking Green geeks in Minnesota	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks