Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Measuring programmer quality

by deorth (Scribe)
on Oct 25, 2007 at 06:30 UTC ( #647084=perlmeditation: print w/ replies, xml ) Need Help??

Dear Monks,

I've been thinking about this one for a few days, mostly because I've moved from a systems admin-centric role into a more pure development role. I'm trying to find a way to baseline and measure my output/performance while focusing on quality as the critical component. This is really for personal enrichment, but if its useful to others then great.

This is a general dev question of course, but since most of my development is in perl, I thought it relevant to my favourite bunch of perlheads :)

What metrics does one use to measure a programmers 'quality' or 'kwalitee' of output ? I can envisage a series of normalized metrics, each of which has to reach a minimum passing value, and then a weighted mean of them all would constitute a 'grade'.

Things I'm thinking are :

  • number of defects opened against your code over time
  • numbers of defects closed over time
  • number of lines of code added per defect fixed
  • number of lines of code *removed* per defect
  • number of lines of code made into reusable modules
  • average 'time to fix'
  • number of unit tests written per subroutine
  • ...

I'm referring here of course to internal company code, and not CPAN code, which has its own well developed standards as I understand it.

How these all hang together I'm really not sure yet. Thus, I open it to the floor for discussion :)

Thanks for your time and consideration.

Comment on Measuring programmer quality
Re: Measuring programmer quality
by Corion (Pope) on Oct 25, 2007 at 06:37 UTC

    Beware that by introducing measures, people will optimize for the measure, not for the intended goal.

    • If you measure the number of lines of code removed per defect, people will remove lines.
    • If you penalize the number of lines added, people will not add lines of code.
    • If you penalize the average time to fix, people will quickly/immediately close bugs, either as "not a bug" or as "fixed", regardless of whether that's true.

    The underlying question of whether what people do actually makes sense in the context given (removing lines of code likely reduces the code complexity and hence is "good") is not easily answered by these metrics, and such metrics will pull people into the direction of gaming the metrics.

      Yes, metrics can and usually are gamed. Lets assume for this exercise that my colleagues and I are acting in the best spirit of perlmonks, and wish for self knowledge and self improvement through honest self-assessment :) (which isn't too far from the truth actually)
      I also note that most if not all of the proposed metrics measure quantity over quality.

      Personally speaking, I expect my programming output is extremely low for professional developers. I take a lot of time to think out code beforehand. I'm driven to produce code which feels like it presents ideal textbook craftsmanship. The solution should be elegant. The code should be internally consistent. Only the very highest tier should express anything like business logic or application-specific behavior. There shouldn't be any crufty special cases. Test cases on each module should be automated and thorough. It's a "Zero Defect" mentality.

      But by some people's measure (especially dividing any metric over time), this output really sucks.

      --
      [ e d @ h a l l e y . c c ]

Re: Measuring programmer quality
by Zaxo (Archbishop) on Oct 25, 2007 at 08:56 UTC

    Devise a Test suite from your project requirements. If it passes, everyone on the project passes. If not, investigate whether the test or the solution is incorrect.

    After Compline,
    Zaxo

Re: Measuring programmer quality
by ajt (Prior) on Oct 25, 2007 at 09:33 UTC

    When Labour came to power in the UK they introduced many tests and targets for all levels of government to improve standards. Staff were "empowered" to do anything they wanted to meet the targets and as the targets were correlated with good service what could go wrong?

    One police force reclassified most minor crimes as disturbances giving them a massive reduction in reported crime. Hospitals reclassified corridors as "transit wards" and took the wheels of trollies declaring them as beds - now there are no patients waiting on trollies in corridors...!

    Even with the best intentions and mostly decent people, if there is pressure to conform to arbitrary standards people conform to the standard even if it's no benefit to the end user in any way. While some people feel very smug, the whole targets process has been a vast waste of money and probably done more harm than good.

    Having just gone through a Lean Six Sigma training course, it is vital that you measure the things that the end user actually cares about, not things that may just correlate with them. However, it is really, really hard to come up with good targets and tests and if you get them wrong you will do a lot of harm as the UK government has...


    --
    ajt
Re: Measuring programmer quality
by dk (Chaplain) on Oct 25, 2007 at 12:02 UTC
    Is it just me, or does it indeed sound silly, "programmer grade A" or "programmer grade 85%"? Couldn't resist, sorry :)
Re: Measuring programmer quality
by dragonchild (Archbishop) on Oct 25, 2007 at 13:30 UTC
    The only useful measures of productivity in any field are:
    1. Tasks done correctly per unit of time
    2. Money earned vs. money cost
    Essentially, how quickly does the worker do correct work and how much profit are you making on that worker's cost. Now, "profit" is a term that only the business people can define. For example, IBM has about a thousand people working in various research labs on products that won't see market, let alone turn a profit, in 30 years (if ever). But, the business folk at IBM have 70 years of being able to point to products that have come out of IBM research that have paid for said researchers 100 times over. Google's policy of 20% time is very similar.

    You seem to be more interested in the first measure - correctly finished tasks per unit of time. In programming, there is a corollary measurement that I allude to in my sig - the difficultly of changing a worker's product. If you produce code that works perfectly twice as fast, but it takes 10x longer to make a change, you have cost the company more money in maintenance that you saved in production. And, given that most applications spend 80% of their life cycle in maintenance, this can be a very significant criterion.

    And, then there's the more ephemeral stuff. Let's say we have a team member who doesn't produce a lot of code and what they do produce isn't very good. But, they have a very good understanding of the application's architecture and have been instrumental in avoiding a number of pitfalls. What value does that person bring to the team? Personally, I like having people like that on board. But, how do you measure "pitfalls avoided through lunch discussions"?

    Ultimately, it boils down to the fact that programming isn't engineering - it's sculpting or music composition. While you can have production quotas in the arts1, that leads to a very stagnant output with little innovation. And we can see this in our field. Take a look at the innovators of programming theory and how they tend to work. Then, look at the corporate drones and how they tend to work.

    Putting metrics on programming output isn't a bad thing. But, it's not clear how one measures maintainability, code quality (not kwalitee), and various other "unmeasurables." And, frankly, code coverage is probably more important than the number of tests. I may only need 30 tests to cover these 600 lines of code, but I may need 600 tests to cover these 30 lines over here. The rest of your proposed metrics fall into the same category. ("number of lines made reusable" - that's called refactoring and it's something everyone should do on a continuous basis!)

    As for "average time to fix" ... that kind of metric scares the bejeezus out of me. It assumes that your programmers are slackers who only intend on mooching off the company. Bugs will take however long they take to find. Fixing a bug is almost always very quick, once you've isolated the problem and generated a repeatable testcase. Managers who say "This bug has been open for 3 weeks. Fix it already!" don't understand that creating a repeatable testcase can take 99% of time to actually fix a bug.

    1. The cathedral choirmasters in the Middle Ages were required to write a new piece of music every Sunday and every feastday.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        And, then there's the more ephemeral stuff. Let's say we have a team member who doesn't produce a lot of code and what they do produce isn't very good. But, they have a very good understanding of the application's architecture and have been instrumental in avoiding a number of pitfalls. What value does that person bring to the team? Personally, I like having people like that on board. But, how do you measure "pitfalls avoided through lunch discussions"?

      This is a very key idea .. one of the reasons I love working where I am now is that I can get into a discussion with a co-worker in the kitchen, get something figured out, and get on with my job -- it's extremely efficient. The old guard methods of having a weekly all-hands meeting strikes me as incredibly wasteful. Development should be done way more as an interrupt-driven process and way less as a polling process. And by interrupt-driven, I'm not talking about going into someone's cube and interrupting them -- I mean Management By Walking Around, where you talk to each of the developers and find out where they're at.

        Ultimately, it boils down to the fact that programming isn't engineering - it's sculpting or music composition.

      Yes. I'm also a musician, and (sometimes) a composer, and I know that there are days where you're on -- good stuff just comes out of your hands and you make music, or write great code. Other days, it's just not happening. And that's when you need to take a break, drink coffee, look out the window for a while.

      About twenty years ago I had a manager with a background in Tourism (really) who was bossing a crew of C developers, and he just didn't get that sometimes, developers need to stop and think, to figure out how to approach the problem. He thought that if you weren't sitting at your computer typing, you were goofing off.

      Obviously, I heartily disagree.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Measuring programmer quality
by cmv (Chaplain) on Oct 25, 2007 at 13:30 UTC
    To me, programming is a very personal art. I want to say like a painting that is never quite finished, or a sculpture that keeps getting changed and added to, but these are not right. I have yet to come up with a good simile.

    The measure of quality that I use against my programs is along these lines:
    1.) How easy is it to modify the program to do tasks that were unimagined when it was written?
    2.) How easy is it to understand after forgetting all about everything it does?

    I was once asked by a real live artist (painter) who had no clue about computers, why they would dare call programming art. I tried to explain that there is a sort of magic in a well written program, that isn't there in most ordinary programs. The magic is like what there is in a great movie, or a book you stay up all night to finish, or what the sunlight does to the way you see things in the early morning and the late evening. When she didn't understand this, I tried using my Christmas story.

    Every couple of years when I was small, at Christmas time, one present, above all others turned out to be something very special to me for one reason or another -- it had magic. Usually, this was the thing that I gave a lot of attention to - kept it clean, handled it with care, walked around with it all day long, and in the ultimate tribute to the item - took it to bed with me on Christmas night.

    When I write a really good (perl, C, whatever) program, I don't know it right away. It's usually months or years later, when it needs to be changed to do something unimagined when it was first written. I dump myself back into the program to try and understand it's DNA - how it was built, what was I (or the original programmer) thinking when they designed/wrote it, what is the framework. If I can understand all that, in a short amount of time, and it all hangs together well, then it's pretty darned good. However, if I can change it with a few quick swipes, to do this new, unimagined thing - well, I usually get a little tickle in my belly, and this goofy grin on my face. Then I know that it's a winner!!

    I usually want to take it to bed with me that night.

    She seemed to understand this.

    ...my 2 cents.

    -Craig

Re: Measuring programmer quality
by bwelch (Curate) on Oct 25, 2007 at 14:23 UTC
    Those measurements seem to encourage one to develop and fix code quickly rather than carefully or efficiently, and to format code to use many lines.

    Characteristics I might use:
    • How easy is it to turn over a developer's code to another? Is it well commented? Is it clear what the code does and does not do? Is the code easily understood and adapted to other uses? Does it follow a style guide?
    • Does the code include error checking and logging? How hard is it to recover from errors?
    • How good is the boundary testing? By this, I mean to ask if the code scales well and handles the extremes of input data well. For example, a database export tool I inherited read all of a database into a hash tree and maintained all relationships in the database. The catch was that it stored the entire database in memory before exporting it to xml files. For any database with more than 50,000 users, the tool ran out of memory, making it useless for production systems.
    • Does the developer create tools and practices that benefit others?
Re: Measuring programmer quality
by kyle (Abbot) on Oct 25, 2007 at 14:24 UTC
Re: Measuring programmer quality
by talexb (Canon) on Oct 25, 2007 at 21:08 UTC

    I understand the desire of non-programmers to want to tie 'quality' or even 'kwalitee' to hard numbers. I think this is a futile quest. Or kwest.

    As a programmer (I like to call it "software engineer", la-de-dah), I have certain ideas about how to measure someone's quality. You'd probably want to look at the following:

    • How many smart questions they ask when they're given some programming job to do;
    • How quickly they produce some bare-bones functionality;
    • How good their code looks;
    • How well they back up their code with tests and documentation;
    • How easily they are able to answer questions about how their code works, and whether or not it can be updated to a change or a new feature
    • How many bugs their code has;
    • How easily they are able to locate the bugs and fix them;
    • How well they know what's going on in their code;
    • How open they are to a code review; and
    • How proud they are of their work.
    A lot of these items are what you wrote in your list, but without the hard numbers. The thing is, if programmer A has six bugs in their code and programmer B only has two, what does that mean?
    • A is three times worse than B?
    • A's bugs are shallower?
    • B's bugs haven't all been found yet?
    • A's code is used more?
    It's way too vague -- and we haven't even talked about the severtiy of the bugs -- A's bugs could be simple little tweaks, and B's bugs could be show-stoppers, huge gaping holes in the design.

    I hope I meet a lot (if not all) of the criteria in the first list -- I'm proud of my code, and I believe it's high quality stuff -- one of my scripts has been in Production as version 1.4 (the third revision) for over two years -- it's a little over 100 lines, but it does exactly what it needs to do, and well.

    Conversely, a programmer who hoards information, whose code looks horrible, who cannot fix any of their own code, who refuses requests to do a code review -- that's someone I'd stay away from. These are more qualitative measurements, but I think they're just as valid.

    Oh, and one more thing -- programmers need to be able to communicate well, because without communication, nothing's going to get done on time.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Measuring programmer quality
by BrowserUk (Pope) on Oct 25, 2007 at 23:09 UTC

    When you find a way of measuring how much unnecessary code your programmers didn't write, including pointless tests that serve only to bolster other meaningless statistics; then you'll start to have something.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Measuring programmer quality
by hangon (Deacon) on Oct 26, 2007 at 01:37 UTC

    Let me tell you the story of Fred. Fred was outgoing and mobile. He roamed the offices and knew everyone. He always time for you and an amusing a story to tell. Fred had been working at the company longer than most and was pretty much an institution around the offices. Everybody knew and liked Fred.

    Fred was also the polar opposite of a workaholic. Putting more effort and creativity into avoiding work than most people put into doing their jobs, Fred had elevated slacking to an art form.

    One day the Great and Wise Veep declared that all departments must redouble efforts to increase productivity. Groans are heard throughout the hallowed halls. Undaunted, the Fearless Manager reached into his bag of tricks and produced employee log books. Complete with a list of over 50 activity codes, it was cleverly designed so employees could quickly and efficiently enter a three character activity code to account for every ten minute period spent at work.

    Amid the protestations, moans of frustration and even a few threats, Fearless Manager forges ahead with his plan. The ever likable Fred volunteers to train his coworkers in how to use the log book system.

    As time passes, even the casual observer can see that the department is not accomplishing much. However, Fearless Manager now has volumes of documentation to the contrary. At one monthly meeting, Fearless Manager applauds his staff for productivity increases. Fred is named Employee of the Month for his sterling example of teamwork. Amid the celebration, Fearless Manager is summoned to see the Great and Wise Veep. So he proudly sallies forth with his files, charts and slides, never to return.

Uncrackable nuts
by Your Mother (Canon) on Oct 26, 2007 at 03:40 UTC

    I liked dragonchild 's two points. I'd add that the number of defects can be completely arbitrary because a spec can leave out all the behavior the tickets/defects represent.

    E.g., at my current gig, the specs for product features are usually sentence fragments. I kid you not. Asking for more details produces the response: "We don't have time for that right now." From that sentence fragment the dev is to intuit everything that needs to happen. My first two word ticket there ended up taking three CGIs, a bunch of rewritten PHP, a moved/rehosted website, a patched CPAN module, some JavaScript, and a 16 hour day to catch up with what was ostensibly a "gimme."

    I do wish you could measure dev quality because I've seen a few idiots/slackers become managers and wreck dev teams or projects. I think time and money are the only real measures though. Mmmmm… No, posts on PerlMonks count too. If I were running a company there are at least 15 monks here I'd hire at any rate I could afford without so much as an interview. In fact, at least a couple of them have already responded in this thread. :)

Re: Measuring programmer quality
by sfink (Deacon) on Oct 26, 2007 at 03:46 UTC
    In my experience, the meaning of quality depends quite a bit on the situation, too. If you're the only person who touches your part of the system, then it's better to have documentation that will allow you to pick up where you left off rather than "good documentation", which implies something that anyone else can read and get a good understanding of what's going on. If your code is only seen through a web interface, then usability and exception reporting are more important than flexibility and clean architecture. The list goes on and on... prototypes or production code? Small team where everyone is involved with everyone else's code or large team with lots of specialization? Core system or tool? Are your coworkers novices or experts? Will your code be viewed by 3rd parties (partners? investors?) Short or insanely short timelines?

    Not that having "good documentation" is ever bad, but the relative weightings of different attributes is heavily situation-dependent.

    "Perfect" code, not that there is such a thing, is probably immune from all these considerations. But a "perfect" developer would not be -- you have to juggle things to fit your particular situation, or you're optimizing the wrong things.

    I find that the opinions of the people around me are the most reliable gauge -- the users, the QA folks, the developers I work with, and the production people who suffer with the problems (both in my code and in reality, but reported to them via my code). I know people who quickly write large bodies of code that effectively and efficiently solve the problem at hand -- but they drive me crazy because they assume that everything will be perfectly configured and nothing will ever unexpectedly go wrong. And when it does, and I bring them the problem, they brush me off with a test case demonstrating that it works just fine. Clearly I misconfigured something or something about my environment "isn't what it's supposed to be" (== matches their assumptions).

    In short, I think it's about whether you make the people around you happy, and whether you can keep them happy.

    Come to think of it, that's a lot like another profession. One with a somewhat different dress code.

Re: Measuring programmer quality
by mreece (Friar) on Oct 26, 2007 at 03:57 UTC
    monitor every committed delta. put a happy face next to the names of those whose code you agree with, and put an unhappy face next those you don't.

    this approach is super easy, if a little time consuming, but man, look at all those smilies next to your own name!

Re: Measuring programmer quality
by g0n (Priest) on Oct 27, 2007 at 07:04 UTC
    When implementing a performance monitoring system, its important to try and include what the suits call 'double loop learning'. In other words, you not only need to measure performance on the task (and act on that feedback), but also the performance of the performance measuring system. What that second feedback loop will tell you is almost certainly whats been said in this thread, but as formal feedback in your own context, rather than external comment. This can be handy for justifying changing things :-)

    --------------------------------------------------------------

    "If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."
    John Brunner, "The Shockwave Rider".

Re: Measuring programmer quality
by w-ber (Hermit) on Oct 27, 2007 at 19:39 UTC

    While this is comparing apples and rubber boots, do you measure the quality of a novel by the number of words or spelling errors or how well the story is decomposed into chapters? Or the average time to read by the "average reader"?

    Because the ultimate product of a programmer is the source code, it is impossible to resist the temptation to measure some metrics for it. The number of lines of code is possibly the first and also the fuzziest one: what is a line of code? Is it a statement of the programming language you use? An expression? A function call? A code block?

    In assembly language, one line of code does not do much, perhaps adding an integer to a number in the contents of a register and saving that to another register. In a highly domain-specific language a line of code can, for instance, parse in an XML file and extract the needed data while updating a progress counter. Not only this, but in programming languages where you can overload operators such as + and *, using "var := 1 + 1;" and "var := object1 + object2;" can be of completely different complexities: the former is (likely!) just summing two integers, the latter can be anything from summing boxed numbers to computing the powerset of two sets. (Yes, that has little to do with the symbol +.)

    What meaningful things does "a line of code" here measure? Should the former be counted towards "lines spent" and the latter "lines saved", because the latter hides more complexity? Or vice versa? How can you even automate counting something like this? What meaningful things does word or line count tell of a novel or a dissertation? The size on disk, perhaps.

    The number one reason why this metric is so used is that it's trivial to compute.

    The problem is that the important things, the things that really matter in programming are inside your head. (What a lame thing to say. Of course they are.) The source code is the ultimate product, but of even more importance is how you came up with the source code. What kind of solutions did you use? How did you solve the problems? Are there other solutions? Why did you pick these ones? How did you figure out how to implement the solutions?

    Equally relevant (and something that might be possible to measure) is does the program meet the requirements set? If you used a formal specification language or some other means to capture exactly what the requirements are, it would be possible to check if the program does or does not or which particular requirements are not met. You can encode some of the requirements in unit tests, but not all by far.

    Personally, I use a strange combination of the quality of documentation (I usually have more documentation than source code, but this doesn't tell much), "amount of decoupling" (meaning no more than one concern per module or responsibility per class), light unit tests, and that sense of having the "right" solution. I can't make these explicit, sorry.

    --
    print "Just Another Perl Adept\n";

Re: Measuring programmer quality
by girarde (Friar) on Oct 30, 2007 at 19:57 UTC
    Be very careful in what you choose to measure, because that is what you will get.

      Here's an example of length not correlating with quality. Very, very true; girade++

Re: Measuring programmer quality
by Anonymous Monk on Nov 01, 2007 at 13:32 UTC
    And don't forget the performance measure as it’s a key factor in you’re code execution time.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://647084]
Approved by Corion
Front-paged by lima1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (15)
As of 2014-07-30 19:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (240 votes), past polls