I empathise ... It sounds like you have suffered from idiotic applications of misguided rules
No need for empathy. It was a long time ago, and we didn't suffer :) The result of the study, was that we rejected both the tool and the idea.
With respect to your statistics. One set thereof does not a case make.
I would derive two things from my reading of the numbers.
What I cannot say from those numbers is whether the complexity arises as a result of
Indeed, without inspecting the source code, I cannot even tell the accuracy of those metrics.
It could be that "HitTheEdge" contains a recursive algorithm that is extremely complicated to follow and modify, but simple in it's code representation.
Or that "GenMoves" is a huge if/then/else structure that would be better implemented as a dispatch table
Or that "Slide" uses a string eval to replace itself with an extremely complicated subroutine that the source code analyser sees simply as a big string constant.
And that's my point. You are already looking to derive further metrics from the generated metrics, but there is no way to validate the efficacy of those metrics you have, beyond inspecting the code and making a value judgement.
So you already falling into the trap of allowing the metrics to become self-serving, but the metrics themselves are not reproducible, scientific measurements, they are simply "indicator values".
When you measure the length, mass, hardness, reflectivity, temperature, elasticity, expansion coefficient etc. of a piece of steel, you are collecting a metric which can be reproduced by anyone, anywhere, any time. Even if the tools used to make the measurement are calibrated to a different scale, it is a matter of a simple piece of math, or a lookup table to convert from that scale to whichever scale is needed or preferred. This is not the case for any of the metrics in your table.
You don't say what language your program is coded in but I could (probably) take all of your methods and reduce them to a single line. It would be a very long line, but in most languages it would still run perfectly well. What affect does that have on your metrics?
Equally, we could get half a dozen monks (assuming Perl) to refactor your methods according to their own formatting and coding preferences and skill-levels. And even if they all do a bang-up job of making sure that they reproduce the function of your originals--bugs an all--and if we then used the same program to measure their code as you have used, they will all produce different sets of numbers.
And that is the crux of my distaste for such numbers. They are not metrics. They do not measure anything! They generate a number, according to some heuristic.
They do not measure anything about the correctness, efficiency or maintainability of the code that gets run, they only make some guesses, based upon the way the coder formatted his source code.
In short, they are not comparable, and you cannot perform math with them.
As proof of this, take a look at your "PieceToString" and "HitTheEdge" methods. They have an equal 'complexity' when measured by the same tool. Is this obvious, or even definable from looking at the source code? If I am given two pieces of steel 10 cms long, even without measuring them with a rule, I can easily tell they are the same length. No such comparison is possible for source code.
The tool has become the only way of comparing source code, and as it does not (and could not) adhere to any standard, all measurements are relative, not absolute. So, unless everyone agrees on which tool/language/coding standards etc. etc. to use, there is no way to compare two versions of the same thing.
That means that in order to make comparisons, you have to implement every possible (or interesting) version of the source code, before you can make any inference about whether any one is good or bad.
And even if you could code every possible implementation of a given algorithm, and could prove that they all produced exactly the same results, and you generated your numbers: What would it tell you?
Should you pick the version with the lowest complexity rating? The shortest? The longest? The one with the highest ratio of comments?
Would you make any choice based on the numbers alone? Or would you have to look at the source code?
If you admit that you would have to look at the source code, then you have just thrown your "metrics" in the bin in favour of your own value judgement.
And if you didn't, then you should publish the formulea by which you are going to juggle all those numbers in order to make your decision. It should make for interesting reading.
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.