Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

On moving forward and breaking compatibility

by pemungkah (Priest)
on Feb 10, 2010 at 00:25 UTC ( #822315=perlmeditation: print w/ replies, xml ) Need Help??

EDIT: Added additional example of why "good idea" upgrades may lead to bad results (thanks, crashtest).

EDIT: Added section on blind upgrading as bad practice (thanks, chromatic).

EDIT: Final clarification on making it usable vs. compensating for negligence, and on trust.

Introduction

This started as a reply to WWW::MECHANIZE Get Errors Ending Program Early? but I realized that it was better moved here. First, this is not a criticism of WWW::Mechanize nor its many contributors nor of Andy Lester, its original author. WWW::Mechanize is a great module and truly valuable. This is a meditation on good programming support practices to introduce necessary yet incompatible changes

The story

I love WWW::Mechanize to bits. It's immensely useful. It makes things so easy to do. But it has one wart in its release history, and that is the change that made autocheck on by default.

For those not familiar with the history of this module, the original version required you to check the status of your calls: you would need to say

my $mech = WWW::Mechanize->new(); $mech->get("http://my.server.com/my/path"); if (! $mech=>success) { # attempt to recover }
This was the default for many releases. It was in keeping with many of Perl's APIs: if you do something that might fail, it's your job to check it.

As time went on there was a good cultural shift in the Perl community: from "here's the gun, try to aim away from your foot" to "programmers shouldn't have to remember to keep themselves out of trouble if we can do it automatically". If there is trouble and the programmer isn't checking, you need to do something blatantly obvious to either signal the failure or encourage the programmer to catch the error. This is a great idea, and a great goal. Programmers need all the assistance their support code can give them. Letting me know I have a problem is a good thing.

However, there are two parts to this. One is providing tools which help out this way - this is perhaps the simpler part of the problem. The other, and more difficult, one is getting these tools in the hands of the programmers in a non-disruptive manner. The 1.49_01 release of WWW::Mechanize is an example of doing the first part very well, but not the second.

For release 1.49_01, the Changes file says

THINGS THAT MAY BREAK YOUR CODE
The autocheck argument to the constructor is now ON by default, unless WWW::Mechanize is being subclassed. There are so many new programmers whose ->get() calls fail unchecked that I'm now putting on the seat belts for them.
An excellent idea. A step in a good direction. Yes, this is classed this under "THINGS THAT MAY BREAK YOUR CODE". However, and this is important: 1.49_01 was a point release. Generally speaking, point releases introduce small improvements and backward-compatible changes. Developers can install point releases and not worry about things breaking.

This was a good change, in that it worked right, and it added significant value ... for new WWW::Mechanize users. For people already using WWW::Mechanize at $PREVIOUS_WORKPLACE, it was a serious problem.

What happened?

Code that had been working properly for several years prior to this change suddenly broke. Remember that WWW::Mechanize first shipped in 2002, and this change happened in 2008. This was the first change that didn't make your code die with a real Perl error if you ignored an incompatible change. Worse, the way in which it broke did not point out what the problem was. It was diagnosed accurately (as in, what happened was shown properly):
Error GETing http://your-uri: reason
but not completely: there was no indication why the failure was happening now, or how it needed to be fixed. It required careful reading of the man page to spot that you had to make a change to your new to turn off autocheck unless you were subclassing, in which case things worked the way they had before. Many folks using this module had it installed by someone else and did not see the Changes file that pointed out the potential problem.

For us, there was a period of many months in which we'd get yet another phone call saying "my XXX script that gets stuff from the Web is broken" and we'd have to explain again about autocheck and how to fix it.

Why was this a surprise?

Perhaps we'd become complacent. Previous changes to WWW::Mechanize that might be code-breakers were generally making things that you should have been doing anyway mandatory, or removal of deprecated features that were already generating deprecation warnings.

One change, in handling headers, was done in the switch from the 0.xx version to the 1.00 version. This was a change that required code fixes not related to deprecations, and was not unexpected because of the version bump.

All of these produced errors that made diagnosing the failure pretty simple: no such method, or no such variable (in the case of the header change).

Automatic autocheck was a surprise and a problem because

  1. there were no deprecation warnings leading up to the change: warn "a future version of WWW::Mechanize will require you to turn off automatic status checks in new()", for example.
  2. the version number did not change: the last release without autocheck on was 1.34. The release with it on was 1.49_01. Note that the last API-breaking change similar to this was the 1.00 change in 2004. This led people to expect it would be just another release with no significant functional changes.
  3. the resulting error message did not make it clear what the problem you now had was, and what was needed to fix it.
  4. the documentation did not emphasize that there was a change that would cause a common usage of the module to fail, and that a specific change was needed to prevent this.
  5. there was no easy way to globally restore the old function; it was "scan your code, find the new calls, and fix them".

How could it have been better?

There are several possible ways that this could have worked better, any one of which would have made this change less disruptive:
  • Move the version number up by one. This is an implicit signal that there have been significant changes to the program, and that the API may have changed in incompatible ways.
  • Modify the code in new to accurately diagnose the problem:
    Error GETing http://cantreachthis.com: Page not found (You didn't specify 'autocheck' in new(); use 'autocheck => 1' to if y +ou want to check status yourself.)
  • Have new simply die if the object isn't a subclass of WWW::Mechanize and autocheck wasn't provided in new
  • Add a big HEY YOU - IMPORTANT API CHANGE section at the top of the man page, specifically calling out the change, its effect, likely symptoms, and the needed fix.
  • Provide a global configuration setting or environment variable to allow old code to continue to work until it's fixed.
Doing more than one, or even all of them would have been great: the more possible ways you can prevent a problem, the better.

So why wasn't one of these chosen? There are a number of possible reasons. Sometimes people just make mistakes. Sometimes they forget that others don't have the benefit of their point of view - if you're developing code that you know has had a particular change, you tend to forget that other people weren't there when you made it. It's very difficult sometimes to remember that other people who'll be using your code don't work in the same context that you do, and that you need to make extra efforts to ensure that everyone's starting at the same point when thinking about a problem.

What to learn

If you're working closely with code, it's easy to forget that you know that there's been an incompatible change, that you know how to spot it and fix it, that you know where you documented it. For someone who's not familiar with the change, all this may be mysterious and inexplicable and hard to spot. You need to think ahead about your user base and what they'll need to be up to speed with your change as effortlessly as possible.

  • communicate it
  • signal it
  • explain it
  • mitigate it

Make sure that all of your documentation emphasizes the change: changelog, manpage, checkin comments, error messages (yes, error messages are documentation too!).

Put the upcoming change out on your blog, on Twitter, on Perlmonks, on any relevant mailing list. Do it more than once.

Do everything you can to spot incompatible code in your API and diagnose the error as early as possible (it's better to die in new than it is to wait until an error condition arises because something wasn't done much earlier).

Provide a backdoor such as an environment variable to re-enable the old behavior - it's perfectly reasonable to make this something that emphasizes that the code needs to be fixed rather than worked around:

if ($ENV{WWW_MECH_IGNORING_AUTOCHECK_REQUIREMENT}) { ... }

In short, make it obvious that there's a problem and what it is, and make it obvious what needs to be done to fix it both short-term and long-term; if you can, give people a way to fall back to old behavior temporarily while they get changes in place for the new.

In the long run, making the extra effort to help your users avoid problems will generate a lot of good will, save time, and do a lot for your reputation as a careful and just plain good programmer.

Remember: think before you unconditionally make "good" old code break. If there's no way around it (you've found a really bad bug, or there's a security issue associated with "the standard way" of using your module), you may have to do it, but you absolutely have to communicate that old code will unconditionally break, and show how to fix it.

Well, just upgrading is stupid.

Testing your code and being sure you haven't broken it by upgrading something - absolutely the right thing to do. But this doesn't always happen: what if your ISP decides they need feature X in the new WWW::Mechanize so they upgrade? What if it's late, or you're in a hurry, or distracted, or pressured ... or yeah, just not as smart as you might be? Does this mean you should be on your own if your software ends up broken?

No. As a developer, I have a responsibility to make it easy for a reasonable person to extricate him- or herself from a problem if I can. Everyone has their own version of "reasonable", but mine is the programmer called out of bed at 2AM. Most people are at about half their normal competence when they've just woken up. I prefer to engineer for my stuff to be safe in this situation so that I can't do something stupid to myself. This maybe isn't so much altruism as self-interest. As a developer who wants to write code that is more usable (in the Don Norman sense), this is the difference between "You got this error and here's why and what will fix it" vs. "You got this error" (good luck finding out why and how to fix it) or worse, "You got an error" (good luck figuring out what, why, and how to fix it).

It's also about trust. If it's obvious that the effort went in to try to make the code usable, then the code feels, and is, more trustworthy. The person using the code thinks, "If I make an error, I can trust this code to tell me what it is and and at least give me an idea how to fix it". This is what you want: trustable and usable code. Not code which cossets people to the point that it tries to fix every possible thing that could be wrong, but code that makes the small extra effort to be usable.

Final thoughts

A final example to consider: let's say that your favorite Linux distro decided that there was a fancy new tape driver that was loads more dependable and faster, but which made all your old backups unreadable. Would you want the newest version of the OS to install that without telling you in no uncertain terms that upgrading would make all your tapes unreadable? Would you feel there was a problem with the usability of the code? Would you trust it less?

If on the other hand there was a compatibility mode that would let you read your old tapes, but would only let you write new ones in the improved and upgraded format, you'd feel like you could trust the person who made the decision to have made a good one - because he or she made a good decision in not potentially making valuable data inaccessible without a big investment in time and effort.

Plaudits

I love WWW::Mechanize. I have nothing but the greatest respect for all of its contributors. You've saved me a lot of time and work and I'm grateful.

I hope that you'll excuse my using the consequences of this one choice as an opportunity to talk about good code support practices.

Comment on On moving forward and breaking compatibility
Select or Download Code
Re: On moving forward and breaking compatibility
by zentara (Archbishop) on Feb 10, 2010 at 11:12 UTC
    I'm sure this node will be referred to many times, as SOPW nodes come in about Mech not working, so its good.

    My thoughts on this are 2.

    First, html has been changing so fast, with all this javascript and flash, that I wouldn't expect any script older than a year to actually work. Just look at the history of the attempts at YouTube video downloader scripts. Each attempt works for awhile, until some invisible smart guy upstairs figures out a way to defeat the scripts, to force you to use a browser to get the video. Why? So they can force you to view their ads. I still don't think anyone has found a way to reliably trigger the little flash cws file, which they now send you as a loader for the real video, containing a well hidden, dynamic url. I wonder how Gtk2::Webkit::Mechanize handles it?, since it has a browser built-in.

    Second, always check the examples that come with new modules, for the skeletons of your new programs.

    So things are moving fast forward right now, and we the scripters need to stay agile.... not locked into old methods good 5 or 10 years ago.


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku
      They probably execute the flash ... I heard the xbmc folks do that as well (for hulu and such)

      Note that the change to WWW::Mechanize happened 18 months ago:

      1.49_01 Sat Sep 27 23:50:04 CDT 2008

      ... and I think, it went mostly without problems, or was shadowed by larger problems due to LWP changing its callback infrastructure :).

Re: On moving forward and breaking compatibility
by crashtest (Curate) on Feb 10, 2010 at 21:43 UTC

    I think you make some valid points, but I disagree with you on your interpretation of module version numbers. For me, even a "point" release could indicate a change in functionality that might break my script. Those are my expectations. Clearly, yours are quite different.

    Considering that my point of view wasn't based on anything but a "feel" for how CPAN modules are released, I decided to consult the Perl documentation.

    From perlmodlib:

    Take care when changing a released module. Always strive to remain compatible with previous released versions. Otherwise try to add a mechanism to revert to the old behavior if people rely on it. Document incompatible changes.

    OK so far. Reading further, from perlmodstyle:

    Modules which are "stable" should not break backwards compatibility without at least a long transition phase and a major change in version number.
    (emphasis mine)... but also:
    Version numbers should indicate at least major and minor releases, and possibly sub-minor releases. A major release is one in which most of the functionality has changed, or in which major new functionality is added. A minor release is one in which a small amount of functionality has been added or changed. Sub-minor version numbers are usually used for changes which do not affect functionality, such as documentation patches.
    (Emphasis mine, again).

    The way I read the documentation, I don't see any indication that WWW::Mechanize broke any rules of conduct here.

    Most of your consternation seems to stem from the strong expectation that going from version 1.34 to 1.49_01 would preserve all functionality. I think that expectation is misguided. Perhaps other Monks disagree. Depending on the needs of a production environment, I would not upgrade modules to new "point" releases without moderate testing. God knows I've seen enough bug-fix releases (in non-CPAN software) that broke perfectly working functionality in other areas.

    I like your ideas on preparing and communicating functionality changes. Perhaps your node could be a general-purpose tutorial for module maintenance.

    Anyway, those are my $0.02.

      I'd argue that "all of my scripts that use Mechanize are now broken" is from the user's POV a very major change.

      Automatically checking the status of calls for errors is a great thing for new programs, but every old program following established good practice for using the module is now broken, and there is no alternative except to either downgrade the module (assuming you have the permissions and resources available to do so) or rewrite code that was already working just fine and they broke it without warning me first ARRGH!

      The WWW::Mechanize change is a "perfect storm" situation: a module that's so good you want to use it any time you need to access something via HTTP, so you do. It ends up in many, many programs. You really depend on it. A change that will make new programs much safer and better goes in, but it breaks old ones. Chaos ensues.

      A change that unconditionally breaks an existing codebase with no workaround is never minor. You should make a choice to do this only when there is no other possible alternative - a security problem, a major bug. This change was introduced because it was a "good idea" and "better coding practice". Yes, it absolutely is a good idea, and this is good coding practice. It should definitely have gone in.

      But it is bad maintenance practice.

        But it is bad maintenance practice.

        Upgrading to a new dependency without testing code with the new dependency is a worse maintenance practice.

Re: On moving forward and breaking compatibility
by blakew (Monk) on Feb 12, 2010 at 02:51 UTC
    If I may (this is a Meditation after all), your post reminds me of why I love open source. Something breaks from one version to the next in a mature piece of software, people notice and can find out why (it's not always fun, but you can). You, pemungkah, can come here and write an insightful (inciteful? :P) post and we can all learn something, both tangible and intangible. To top it off, I'm 99.9% sure something will be done as a direct result to make the lives better of countless programmers who depend on said module.

    Contrast with commercial software, something I interface with regularly on a daily basis at $work. One such piece of..software released a new minor version without, apparently, any sort of regression testing whatsoever (imagine that, software shipping with tests! Hah!). Now about half of my scripts fail, and with errors so cryptic and arbitrary you'd swear it was a joke. I would kill for something like

    Error GETing http://cantreachthis.com: Page not found

    My, how direct and straightforward - even though it doesn't take into consideration all the context you've provided, it still gives enough information to figure out what went wrong. And if not all future Google searches will surely lead people to your post, since you did the honors of figuring out exactly what happened.

    Now imagine trying to load a .csv file and getting an error saying "Invalid SQL syntax." or converting between data formats and getting an error message related to a missing "locator" and you get the picture. You can try in despair to find out the answer, but what mostly winds up happening is a bunch of hackish workarounds with lots of colorful commentary in the source.

    Anyways, cheers to you pemungkah and all the monks who remind me why programming is fun and why I gravitated to it.

Re: On moving forward and breaking compatibility
by talexb (Canon) on Feb 12, 2010 at 18:11 UTC

    Interesting. I wanted to mention that IO::Socket made a similar change which was wrapped above and below with NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE NOTE The documented change was As of VERSION 1.18 all IO::Socket objects have autoflush turned on by default. This was not the case with earlier releases.

    The current version's 1.31, and 1.20 (the earliest release on CPAN) was July 23, 1998, so 1.18 was probably about 15 years ago -- but that message is still there. Nice.

    However, with all due respect to Andy Lester, I agree with your point -- I think it's wrong for a module to break existing code, notwithstanding that seat belts are indeed an excellent idea.

    Alex / talexb / Toronto

    Team website: Forex Chart Monkey, Forex Technical Analysis and Pickpocket Prevention

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://822315]
Approved by jettero
Front-paged by afoken
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2014-08-21 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (144 votes), past polls