In several of my nodes, I've used bisection to figure out when certain bugs/features were fixed/introduced. I find this useful because it'll often uncover the discussion that went into a feature or the detailed reasons behind a bug. I thought I'd write about this process, since I don't always document how I ran each bisect.
First, what is bisection? Basically, it's a (partly automated) binary search to find the exact commit where some behavior changed.
A simplified example: Say you want to find out which Perl release s///r was introduced. So, you run perl -e 's///r' on the oldest Perl you've got, say 5.6, and on the newest, 5.30 - the former dies, the latter runs, confirming the change happened somewhere between those releases. So then you divide the search space into two, and run the same test on 5.18 - it works, so you now know the change must have happened between 5.6 and 5.18, and you don't need to test the other half of the search space. Divide the search space again, and run the test on 5.12 - it fails, so you know the change must have happened between 5.12 and 5.18. Repeat the process again, and eventually you'll find out that Perl 5.12 dies, but 5.14 works - so you've now confirmed that s///r was introduced in release 5.14, and you didn't have to test all 13 Perl releases from 5.6 to 5.30! (Technically, the change happened in a development version between those two releases, but for this example we've only looked at non-development releases as a stand-in for commits.)
A real bisect is different from this example in that it works on the granularity of git commits, and it's mostly automated - the Perl source code includes scripts to assist you in running a git-bisect. And instead of using pre-built Perls, the bisection process will check out each commit of Perl and build it from scratch (even applying patches as needed to get older versions of Perl to compile). It can take half an hour to several hours to run a bisect, but once you set it running, you can go and do something else in the meantime.
To do this yourself, you'll need:
- I've always done this on a Linux machine - I suspect it's possible on Windows too, but that's beyond the scope of this node.
- A copy of the Perl 5 source tree checked out using git.
- The libraries and headers needed to build Perl, e.g. on Debian/Ubuntu you can do: sudo apt-get install build-essential and sudo apt-get build-dep perl.
- I strongly recommend having a perlbrew environment set up.
Then, the steps are as follows:
Boil down your code to the smallest bit of code possible that reproduces the issue - sometimes that'll just be a short oneliner, sometimes you need a script. The bisect scripts don't care about what the output of your test code is, only whether the test code exits with a nonzero code (die), or an exit code of zero to indicate success. Sometimes, you might be looking for a change in Perl's output - in that case, you may need to use a bit of trickery and run an external Perl, for that you can use $^X to get the name of the binary (like in this example at the bottom, or this recent slightly more complex example, where I'm inspecting the output of the Perl debugger, and for that I need to interact with environment variables and files).
Note that it's best if your code does not make use of any non-core modules, since including CPAN modules in a bisect is possible, but more tricky and slower. And of course this test needs to be reliable - anything with random behavior (like hash ordering!) will give you very misleading results or cause the bisect process to fail. If you write a script, it's best to put it outside the source tree, since that will be checked out many times by the bisection process.
Use perlbrew to check if this really is a version-dependent issue, and on which releases things worked and which they didn't (this step is optional but I strongly recommend it). So for example, I have Perl 5.6 through the latest release installed, and with a simple perlbrew exec perl script.pl or perlbrew exec perl -e '...' I can test on all versions and narrow down the range of versions for the bisect. Sometimes, what you're looking for may be a bug introduced in some version and fixed in a later version - like in this example, where I had to break the search range into two, because the bisect process will only work if there is only one transition from failing to succeeding or succeeding to failing. Also, doing this step will help you avoid false positive errors, for example if your test script uses a construct such as // that was added in 5.10 and will fail in older versions, this step will show you much more clearly than during the bisect.
At this point I'll also often just check the perldeltas to see if the change in behavior is easy to find there.
Run the bisect: The documentation is contained in the file Porting/bisect-runner.pl (you can also read it with perldoc FILENAME), but the script you'll be running is Porting/bisect.pl. The most common options I use are:
- --start=... and --end=... to indicate which git tags to use as the beginning and end of the range for the search - you could just take these from the above perlbrew run, but it also doesn't hurt to expand the range a little, because the binary search will still be fairly efficient. (To get a list of tags, just use git tag.)
- --expect-fail - Normally, the bisection script will look for the commit where the test code goes from succeeding to failing. If you want to look for the opposite, when did the test code go from failing to succeeding, you need to supply this option.
- --target=miniperl - Sometimes, you might just be testing Perl's syntax, independently of any modules. In such cases, you can speed up the bisection a bit by specifying this option to stop the build once miniperl has been built (basically a stripped-down precursor version of perl).
- If you want to run a oneliner, it's enough to use the option -e 'code here'.
- If you want to run a script, you need to use bisect.pl ... -- ./perl -Ilib /path/to/script.pl - The "./perl" is because you explicitly want to run the perl built by the bisect process, and "-Ilib" because that perl was built with the default @INC for your system, but it hasn't been installed there, so the libraries are still in the lib directory of the source tree.
The process can take up to an hour or two. Once you've got the commit, you can pull it up on GitHub via the URL https://github.com/Perl/perl5/commit/COMMIT. Read the log message and code changes to get an idea if the commit is in fact related to the issue, because once in a while the bisect can return a misleading result - for example here: The commit found by the bisect was actually one where an optimization was changed that exposed the underlying bug.
Look for any related bug reports, most of the time commit messages will mention them; sometimes they'll be mentioned in the code changes. Note that the bug reports have recently been migrated away from rt.perl.org to GitHub, which changed their numbering: At the moment, when you see bugs mentioned in the notation #1234, that'll usually be referring to the RT bug number, although I suspect that'll start changing now with the move to GitHub. On GitHub, you can find the old issue numbers using the search term RT1234$. Another thing you can do to find more information is look in the P5P archives for the commits or bug numbers to see if there is additional discussion there.
Use git tag --contains COMMIT to check which Perl releases the commits are contained in. If it's an old release, I'll also sometimes look at perlhist to get the release date to tell someone how many years their Perl is outdated ;-)
In case you want to try this yourself, here are the nodes where I did document the bisect commands, particularly the first two of these links are (relatively) simple examples:
Updates: Minor edits for clarification. Added a bit of info to point 2 and the first paragraph. Added another link.
Update 2021: I now have a talk about this topic as well (audio currently only in German, slides in English): https://github.com/haukex/bisectalk