Don't ask to ask, just ask | |
PerlMonks |
Hacking perlby robin (Chaplain) |
on Oct 20, 2005 at 23:29 UTC ( [id://501844]=perlmeditation: print w/replies, xml ) | Need Help?? |
So, you know everything there is to know about Perl. You stifle a yawn as you flick through the latest obfuscations – how obvious they are! It's gratifying to be an expert; and your patient, only slightly condescending, help is appreciated by the less experienced. Life is good. But sometimes, in a reflective moment, you miss your younger days as an intrepid explorer in unfamiliar terrain, where there was always a new mystery to unravel, a new landscape to be discovered. You could go and learn ruby or something, but that feels vaguely disloyal after so many years nestled happily in the bosom of Perl. And besides, you don't really like the idea of being patronised by experts on rubymonks or whatever the hell they have. This is the node you've been waiting for. There is a way! You can plunge into an unfamiliar world of mystery and at the same time enhance your reputation as a Perl guru. I'm talking about perl: the guts, the source, the motherlode. Update: Fixed a few typos; added a “Testing” section, as suggested by hossman.You've probably heard the stories, of strong men who went there and never returned, or returned mumbling and broken and won’t tell what they saw. One explorer survived long enough to describe “an interconnected mass of livers and pancreas and lungs and little sharp pointy things and the occasional exploding kidney”. Don’t worry about it. You’ll be fine. The mistake that people often make is they try to understand it. (Some people make the same mistake with life, or the state of the world, or The Prisoner.) If you can avoid that, you'll be all right. Just crack it open and get stuck in. The slogan of the hour is “HACK FIRST, THINK LATER”. The trick is just to mess with things. Don’t waste too much time worrying about what's going to happen: try it and see what happens. People often say that a good way to get started is to try and fix a bug that somebody’s reported. There’s some truth in that, but debugging isn’t most people’s idea of a good time. So that's not what we'll do; instead we’ll add a new feature: lexical typeglobs. It’s always bothered me a little that you can’t say my *foo. Why can’t you? There’s no good reason. It’s not very useful, I admit, but it would be kind of cool. I did it using the HACK FIRST methodology, and wrote down what I was doing as I went along. I was really surprised at how easy it turned out to be: the final patch only changes 38 lines of code. If you’re the kind of person who always reads the guidebook before going to a new place, you might like to glance at these: perlhack, perlguts. It’s not compulsory though. The first thing to do is to get a copy of the source. You can hack on whichever version you like, but I decided to use the latest “bleadperl”. If you want to follow along, you should get it too. This is an interactive tutorial: it’s not designed for reading in the bath, and it probably won’t make so much sense if you’re not actually tinkering with the source while you read. So find a disk with a reasonable amount of free space, make a directory for the source to go in, and grab it. I did it like this: and you should do something similar. Now build it, to make sure it’s working before you start.
You can vary this as you like, but the -Doptimize='-g' -Dusedevel is essential. The -Dusedevel tells it that, yes, you really want to build a development version; and -Doptimize='-g' turns on debugging mode, which we’re going to make good use of later on. In case you haven’t looked at perlhack, I’ll quickly explain the rough structure of the source. Perl code is tokenised by a rather hairy routine called yylex that lives in toke.c, then it’s parsed using the bison grammar that lives in perly.y. The grammar uses the routines in op.c to build an optree. The optree is then executed by a one-liner that lives in run.c, which dispatches each op to the appropriate routine. The ops themselves are implemented by functions in the files pp*.c. Back to the problem at hand! Our first task is to persuade perl to recognise the new construct. A quick gives which shows that the parser doesn’t even recognise the syntax. So we crack open the grammar (in perly.y, remember), and start grepping for ‘my’. Soon we find this: and you don't need a degree in rocket science to see that this is the bit we’re interested in. Down at the end of the file, the symbols ‘scalar’, ‘hsh’ and ‘ary’ are defined, like this: and there’s a similar entry for globs, though it seems to be called “star” rather than “glob”. (The perl source is full of little things that don’t quite make sense – that’s part of its charm.) Here it is:
So let’s add another clause to the definition of ‘myterm’, like this: Time to check it out! First we have to rebuild the parser using the new grammar, then rebuild perl itself: Try the one-liner again: Great! It’s been parsed okay, and it’s now being rejected during compilation. The compiler is housed in op.c, and we’ll need to write some code to compile our new construct. But before we can do that, we need to decide what we’re going to compile it to. Let’s have a quick peek at what perl does with other ‘my’ declarations: Okay, so they get compiled to special ops called /pad.v/. (In case you don’t know, there’s a special perl guts shorthand for different types of value. The most important ones are: a scalar is an SV, an array is an AV, a hash is an HV and a glob is a GV. Oh yeah, and a reference is an RV. Pretty simple really.) Looks like we ought to make a padgv op! The ops are all defined in a file called opcode.pl, which auto-generates the relevant header files. If we were worried about backwards compatibility, we’d add the new op at the end; but this is just for fun so we’re not really fussed about compatibility, and we’ll add it at the logical place in the file: Now run opcode.pl, which updates opcode.h, opnames.h and pp_proto.h on your behalf. We’ve got a new op, but the compiler isn’t going to use it unless we tell it how. So crack open op.c, and squint at the Perl_newGVREF() function. Looking at Perl_new[SAH]VREF for comparison, it’s fairly obvious what we have to do:
We’re going to need to implement the new op at some point, but for now let’s just whack in a placeholder:
Wahey! Let’s check out what we’ve got so far. Run ‘make’ to rebuild it, then:
Oh dear. :-( The error is to be expected – we still haven’t looked into that – but we have OP_RV2GV instead of our shiny new OP_PADGV. What’s that about? Time to wheel out the old debugger. Hmm, so we’ve got an OP_CONST instead of the OP_PADANY we were expecting. It’s time to find out where those PADANYs are coming from:
Ah! It’s tokeniser magic. (You didn’t think the tokeniser just tokenised, did you? Oh no.) That means it’s time to dive into toke.c and see if we can grok what’s happening there. These newOP() commands are both in a function called S_pending_indent(), which gets called right from the top of the main lexer routine Perl_yylex(): That means that PL_pending_ident must be getting set for $foo, @foo and %foo, but not for *foo. A quick grep through the file reveals that we’re quite right – when a ‘*’ is encountered, something called force_ident() gets called instead. Let’s try changing it:
We try rebuilding perl. It builds miniperl okay, but then it dies with a load of syntax errors during the build process. The most telling-looking one is the second one:
Hmm, let’s see what the tokeniser is up to: I can’t tell what’s wrong just looking at that, so let’s try comparing it to something similar that does work: That’s interesting! There’s definitely something different there. The second block of the ‘*’ run doesn’t appear at all in this one. Maybe we got something wrong in the tokeniser change? Sure enough, another look at toke.c shows that the code for ‘%’ is setting PL_tokenbuf[0] = '%', which we weren’t doing. So let’s try a slightly improved change, copying the structure of the ‘%’ code a bit more faithfully: Right, let’s try another make. This one fails too, but in a much more interesting way: Looking at line 1158 of Config_heavy.pl, we find: It looks like "use strict" is now affecting globs! Indeed: Also, it seems to be confusing globs with hashes. What's this "%foo" all about? Let's see where the error is coming from: A quick look in gv.c tells us that it's coming from Perl_gv_fetchpvn_flags(). Who's calling that? It's debugger time again!
Aha! So S_pending_ident() is calling gv_fetchpv(). There it is, right at the end, like this:
Ah! It's assuming that anything that's not a scalar or an array must be a hash. But we've added a new possibility, so let’s tell it about that:
Now we try another ‘make perl’, and everything builds as normal. Phew! Even better, the new op is being used in the right place: Still got that pesky error though... I wonder where that one's coming from:
opmini.c is just an autogenerated copy of op.c that’s used to build miniperl. I don’t know why there’s a separate file for this – probably those hysterical raisins again. Anyway, this looks like the right bit, here: I guess we need to tell it about our new op. Rebuild once again, and: It’s executing our new op! Now we’re talking! I guess that means that we ought to implement the thing. We don’t really know what we’re doing here, so let’s just try something really simple, a kind of ultra stripped-down version of pp_padsv, and see what happens. It builds okay again, so let’s try and do something useful with it: Hmm, another error. Where's it coming from? Aha! This is the default clause in a big switch statement, in a function called Perl_mod(). I guess we need to tell this about our new op too: Right, now let’s try to use it again: Hot damn! It seems to be working. Let’s try some more experiments: No way! This is great. Hmm, I wonder what a new glob is called:
Oh dear, not so good. I guess we need to initialise the new glob somehow. It looks like we can make a new glob using newGVgen(), but we have to pass a package name, and of course a lexical glob doesn’t live in a package. Let's use the bogus package “lexical”, so lexical globs are easy to spot. The other problem is that newGVgen() returns a pointer to the GV, and there doesn't seem to be any sensible way to copy this GV into the pad entry. (Confession: I made a false start here. I tried copying the GV into the pad sv using sv_setsv(), but it doesn't seem to copy all the relevant fields. So then I tried the following.) This calls for a slight change of strategy. We'll use the pad entry as a reference to the glob. Like this: If you’re wondering what all these oddly-named functions are doing, have another squint at perlguts. (Okay, I admit it! I didn’t get this right first time either. I forgot the SvROK_on(), which caused segfaults during global destruction (of all the bizarre places), and it took a lengthy session in the debugger before I figured out what I’d done wrong.) Anyhow, let’s give it a whirl:
Great! Let's try a closure: That prints “Hmm23”, which is cool! Look at this though: Oh dear! Our supposedly lexical globs are being created as real package variables. That leads to stupendous memory leaks – for example, the loop: will keep on growing until the computer runs out of memory. We'd better do something about that. How about deleting the entry from the %lexical:: stash as soon as it's been created? It's only a one-line addition: Okay... let’s try it out. The examples above still seem to work. What about this? How cool is that? You can see the memory being reused – the same addresses keep coming back again and again. It looks like we have a working Perl interpreter with a shiny new feature! TestingEvery time you add a new feature, or fix a bug, you ought to add some regression tests to make sure that it keeps on working in the future. Even though we don’t expect this particular patch to be maintained in the future, it’s good practice. Perl’s test suite lives in (the subdirectories of) the directory t; the my operation is tested in t/op/my.t. We need to decide whether to add our tests to the existing file, or make a new one. If you look at the op tests, you'll notice that the fundamental ones are all coded by hand – they don’t use Test::More. That's because Test::More is complicated enough that, if a really fundamental feature gets broken, it will almost certainly stop working. It might even stop working in such a way that all the tests appear to have passed, which would be very bad! On the other hand, we can be pretty sure that Test::More doesn’t make use of lexical typeglobs, so there’s no reason we can’t use it in our tests. That settles it then: we’ll make a new file, say t/op/my_glob.t:It could certainly be more thorough, but this covers the essentials. Tidying upIf you run ‘make test’ at this point, there are a handful of test failures. That's not a real problem – they all come, in one way or another, from the fact that we've added a new opcode. The Opcode module whines that it doesn’t know about this crazy padgv thingy, but it’s easy to make it happy:
In similar vein, the test for Safe uses a list of tests that has to match up with the list of ops. One of the tests for B::Concise fails because it's looking for a specific opcode, which we've changed by inserting a new op into the middle of the list. That's easy to fix too: Now the tests all pass! There’s one more thing. If you’re making a non-standard change to perl, like we've just done, you're supposed to register it as a local patch. You do that by adding a line to patchlevel.h, like so:
Now when we run ./perl -V, we get: The whole patch is here. It only changes 38 lines of code.
Back to
Meditations
|
|