Re: Nested data structures... nasty?

in reply to Nested data structures... nasty?

The only major thing that irritates me about Perl's way of doing complex data structures is that the specification is implicit: you never sit down and declare a "Hash of Hashes of Arrays", for instance: you put arrayrefs in a hash, then a ref to that hash in another hash. And you never sit down and code up a list of admissible hash keys... if you haven't documented what's expected to be in the hash, and where it's inserted, your code can be pretty impenetrable.

On the whole, I think this way of doing things is better than the alternative: well-defined, rigid data structures described at compile time. (Yes, folks, I like LISP too -- conses are your friends. :-) What irritates me is that these "implicit" structures imposes an extra documentation burden, and one that's not always easily satisfied in code. Rob Pike claims that it's better to put the complexity in the data than the code, and I agree, but that doesn't help you if the data is undecipherable.

Update: Another possible disadvantage of Perlish (or LISPy, ~~Haskellic~~Schemeing, etfc) "implicit, on the fly" data structures is that some compiler optimizations are difficult or even impossible. NOTE: I haven't actually tested these assumptions empirically; for one thing, I'm at work, and don't have the time. I'm going on my understanding of modern computer architecture and compiler optimization, which may be sorely lacking. Thou hast been warned.

For instance, if I want a 3d vector, I could write the following C:

typedef struct {
  float x;
  float y;
  float z;
} vec3d;
[download]

Now, the compiler knows that a vec3d takes up exactly 12 bytes (assuming 4-byte floats), and can make a bunch of optimizations based on that knowledge and some information about the processor it's compiling for. For instance, it can pad the struct out to 16 bytes if addressing is faster on 16-byte boundaries. It can take a declaration like:

vec3d vertices[20];
[download]

and allocate 240 (or 320) contiguous bytes for it.

Contrast that to the implicit equivalent:

my %vert = (
  'x' => undef, # placeholder
  'y' => undef,
  'z' => undef,
);
[download]

Perl knows nothing about the size of this hash: it has three elements now, but there's nothing stopping you from adding a dozen more in the very next statement. And since the size of the hash isn't guaranteed, the best Perl can do if you put twenty vertices in an array is allocate twenty contiguous references: that's better than nothing, but unless you're improbably ~~lucky~~ careful in your memory management, the hashes those refs point to are going to be scattered all over memory, which means cache misses and subsequent stalls.

The Perlish alternative is to create data structures in scalars with pack, which is just fugly.

The "Perl is slower than C!" thesis shouldn't be any surprise to anyone, of course. I just want to point out that the flexibility of hashed (hashish? ;-) structures is sometimes a disadvantage. Again, I want to emphasize that I tend to like this way better, but I'd be a fool to pretend that it's perfect.

Update 2: Minor grammar corrections.

Update 3: I'm not picking on Common LISP in particular, just the "add stuff on the fly" way of building complex structures, which you can do in Common LISP (or any other LISP, for that matter), and in my experience is the most common way of building structures.

A good general rule for finding language features in Common LISP is "it's in there somewhere". :-)

-- The hell with paco, vote for Erudil! :wq

In Section Meditations