in reply to
Re: Maintainable code is the best code
in thread Maintainable code is the best code
Orthogonality... yeah, nice features. Since a few months I've been grinding orthogonality in the sense of principal component analysis (single value decomposition)  mind you, I'm not a mathematician, I just apply stuff on physiology.
When I read your post Masem it occurred to me that good sets of functions are just the opposite of principal components. For you who are not familiar with this (intriguing) subject, principal components are a set of orthogonal axes of data that are chosen (like rotating and scaling the 3D euclidian space) in such a way that as much of the variability as possible goes into the first axis, from the remaining data as much as possible in the 2nd component etc etc. Moreover, all these components are orthogonal. Apart from the physiological applications I'm familiar with, there are also mathematical implications, like matrix rank, finding solutions and so on.
A good set of functions is the opposite of principal components.
You don't want as much functionality as possible crammed into a single function and all remaining bits scattered around a bunch of insignifant code. You want each function to have a clear and confined scope, so that each of the functions is as meaningfull and concise as possible without a lot of flags and parameters and other confusing stuff.
I don't know the opposite of single value decomposition in a statistical sense, but it probably is an illposed problem. You can recombine orthogonal axes into an infinite number of other orthogonal axes (coordinate systems) but only one set forms the principal components. The reverse, equal division of information over all axes is possible in many different ways and therefore it is hard to come up with a solution (I figure, but I don't have hard proof for this... does anyone?).
This compares nicely with writing code. There are a lot of ways to write orthogonal functions, but it is hard to divide the functional space equally over functions.
Is there a name for this? Rural components maybe?
Jeroen
"We are not alone"(FZ)
Re: Re:{2} Maintainable code is the best code  principal components by Masem (Monsignor) on Oct 03, 2001 at 08:03 UTC 
Actually, I was also approproaching orthogonality from a principle component (PCAnalysis) standpoint (though for experimental data analysis).
Now to go over the heads of everyone else that has no idea what PCA is :), the programming equivalent is that you have M 'overall functions' that your software will want to do. A good refactoring down to an orthogonal set in programming should result in N small functions, with N >> M. As jeroenes indicates, this is illdefined from a PCA, as with PCA, you'd want to select a small number ( < M ) to approximate the job. However, unstated in the refactoring process is the fact that you should be thinking in the future and the past, and in reality, you might have P projects, each with M_sub_i (i = 1 to P) 'overall functions', such that the total of all functions over all projects past and present and future will result in M', with M' >> N >> M.
In plain text, you should be refactoring to find an orthogonal set of functions that are reusable for other problems, including functions that might have been created already, and ones that might be part of future programs. This is the same conclusion the parent thread reaches as well as numerous other texts on programming, for for those with a mathematical bent, there's some empricalness to it as well.

Dr. Michael K. Neylon  mneylonpm@masemware.com

"You've left the lens cap of your mind on again, Pinky"  The Brain
It's not what you know, but knowing how to find it if you don't know that's important
 [reply] 

No wonder that your first reply reminded me of PCA.
May I ask what kind of data you use PCA for?
It apperently
gets more popular these days. I have seen it used for genetic
chimera analysis and DNA arrays as a prelude to clustering.
When I started to use it, my mentor was very sceptic about
how acceptable it would be. While the statistican who helps
me told me it was a technique of about a century old so
nobody should complain.
Anyway, I use it for clustering analysis as well, but than
for extracellularly recorded neuronal spike waveforms. So
I sample spike waveforms from an electrode that was placed in
a brain slice and turn a PCA routine loose on it. Mostly just
the first two components are enough to get your clusters.
Jeroen
"We are not alone"(FZ)
 [reply] 

Without going into too many specifics given the nature of my work, I'm using it to try to break down chemical spectra into identifiable components. When/if I get to publish this, I'll try to let folks know, though time in the peerreviewed journal world is only an illusion... :)
For those that are curious, principle component analysis or factor analysis or a number of other different names descibes a method for breaking down sets of data in key basis sets. It assumes that all experimental data is a linear combination of collected data, and thus, if your collected data is N units long with M total sets, you can use singular value decomposition to get M basis sets N units long, and a square M x M weight matrix. This is an 'exact' specification. However, we typically want only C components, with C << M. Because during singular value decomposition, we generate M eigenvalues, we can use empirical, statistical, or other methods to determine what C is, and which of those M basis sets are the most important.
Note that these basis sets may have any actual meaning; as jeroenes indicates, the method breaks out these basis sets as to attempt to minimize the variation of the data along one Cdimensional vector. However, there are ways to transform the data from the PCA basis set to a set of vectors that have some meaning. In my case, it's going from a basis set of spectra that represent no real substance to spectra of real substances; I can then get an idea of the composition of all the other nonbasis set data that I started with.
As jeroenes also indicated, you can use the basis sets and weights to find out where clusters of data exist, and use those to guide the selection of basis sets and transformations to understand the data better.
It's a very elegant method for largescale data analysis and very easy to do with help from computers (there's enough empirical analysis that has to be done that a human needs to guide the end decisions).

Dr. Michael K. Neylon  mneylonpm@masemware.com

"You've left the lens cap of your mind on again, Pinky"  The Brain
It's not what you know, but knowing how to find it if you don't know that's important
 [reply] 
Re (tilly) 3: Maintainable code is the best code  principal components by tilly (Archbishop) on Oct 04, 2001 at 23:56 UTC 
I think that principle components analysis is the wrong way to think about this problem.
First of all the analogy does not really carry. Principle components analysis depends on having some metric for how "similar" two vectors are which corresponds to the geometric "dot product". While many real world situations fit, and in many more you can fairly harmlessly just make one up, I don't think that code manages to fit this description very well.
But secondly, even if the analogy did carry, the basic problem is different. Principle component analysis is about taking a complex multidimensional data structure and summarizing most of the information with a small number of numbers. The remaining information is usually considered to be "noise" or otherwise irrelevant. But a program has to remain a full description.
Instead I think a good place to start thinking about this is Larry Wall's comment about Huffman coding in Apocalypse 3. That is an extremely important comment. As I indicated in Re (tilly) 3: Looking backwards to GO forwards, there is a connection between understanding well, and having mental models which are concise. And sourcecode is just a perfectly detailed mental model of how the program works, laid down in text.
As observing the results of Perl golf will show you, shortness is not the only consideration for well laidout programs. However it is an important one.
So if laying out a program for conciseness matters, what does that tell us? Well basic information theory says a lot. In information theory, information is stated in terms of what could be said. The information in a signal is measured by how much it specified the overall message, that is how much it cut down the problem space of what you could be saying. This is a definition that depends more on what you could be saying more than what you are saying. Anyways from information theory, at perfect compression, every bit will carry just as much information about the overall message as any other bit. From a human point of view, some of those bits carry more important information. (The average color of a picture winter scene has more visual impact than the placement of an edge of a snowflake.) But the amount of information is evenly distributed.
And so it is with programming. Wellwritten sourcecode is a textual representation of a model that is good for thinking about the problem. It will therefore be fairly efficient in its representation (although the text will be inefficient in ways that reduce the amount of information a human needs to understand the code). Being efficient, functions will convey a similar amount of surprise, and the total surprise per block is likely to be fairly large.
In short, there will be a good compression in the following sense. A fixed human effort spend by a good programmer in trying to master the code, should be result in a relatively large portion of the system being understood. This is far from a compact textual representation. For instance the human eye finds overall shapes easy to follow, therefore it is good to have huge amounts of text be spent in allowing instant pattern recognition of the overall code structure. (What portion of your sourcecode is taken up with spaces whose purpose is to keep a consistent indentation/brace style?)
Of course, though, some of that code will be highorder design, and some will be minor details. In terms of how much information is passed, they may be similar. But the importance differs...  [reply] 

Actually, I think that principal components is a horrible way of look at programming. Programming is, essentially, the art of instructing a being as to what to do. This being has an IQ of 0, but perfect recall, and will do actions over and over until told to stop. There is no analysis by the being as to what it's told to do.
As for a human reader, the analysis is focused on atoms, which can be viewed as roughly analogous to principal components, but they're not.
The first principal component is meant to convey the most information about the data space/solution space. The next will convey the most of whatever the first couldn't convey, and so on.
In programming, the goal is for each atom (or component) to convey only as much information as is necessary for it to be a meaningful atom. Thus, the programmer builds up larger atoms from smaller atoms. The goal is to eventually reach the 'topmost' structure, which would be the main() function in C, for example. That function is built completely of calls to language syntax and other atoms, whose names should reflect what that syntax or atom is doing. Thus, we don't have if doing what while would do, and vice versa.
In data analysis, you want to look at the smallest number of things that give you the largest amount of knowledge of your dataset. But, you're not analyzing data. You're reading algorithms, which do not compose a dataset in the same way that observed waveforms would. To understand an algorithm, you have to understand its component parts, or atoms.
Think of it this way  when you explain a task to someone else, say a child, you break it down into smaller tasks. You keep doing so until each task is comprehensible by the recipient. At that point, you have transmitted atoms. At no point have you attempted to convey as much information as possible in one task. Each task is of similar complexity, or contains similar amounts of information.
 We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.
 [reply] 

I like the idea of atoms in that it captures the point that functions should be small and simple.
But I really think it is key that a good programming model shows a good conceptual model which is going to be wellcompressed. Among other details, that points out not only why you factor code, but also why you avoid repeating it.
 [reply] 



I see what you mean.
Want to clarify a bit, though, as I didn't say that principal
components analysis was a good analogy. I rather said
coding should accomplish the opposite, that is
spreading to information into the functions, dividing it
equally among them.
However stated, the analogy goes wrong because with principal
components we talk about orthogonal vectors in space, while
with programming we have hierarchial functions.
These create a subspace of each own, and you just can not
do a PCA on different subspaces. Is that what you more or less
ment, tilly?
/me notes with a smile in his face how everyone approaches
PCA from its own angle...., Masem from the chemical spectra
point of view, tilly from a encoding point of view while
I think more in pattern deviation scemes.
 [reply] 

