|Think about Loose Coupling|
A sad tale of disappointment, confusion, betrayal and enlightenment. Also an annoying bug that you should never run into, but could cost you quite a lot of hair pulling if you do. Perl is only tangentially related to this problem, so if you are looking for perl code, skip this node.
The Gathering Storm
The task seemed simple enough. A new Unix box had come in and I needed to install some standard (but non-core) modules. No problem.
Say what? The local system is up and on the net. A quick browser pop confirms that we can see google (and therefore, by definition, the whole Internet ;-). Maybe the CPAN module is fubared?
Okaaayyy. That's odd. I was able to get to external sites by name just a second ago. Maybe ftp.perl.org's DNS is hosed at the moment?
So. I can look up the IP number by name. I can ftp to the number. But I cannot ftp to the name. Another quick check shows that I can ftp to plenty of other FTP archives by name, just not ftp.perl.org. Odd.
It's a clean OS install, so I go look on another machine that's been in use for a while. Same symptoms. A different (Unix) OS. Same symptoms. I try directly from my desktop (Win*). Works just fine.
Golf It Down
Well, when you've got one working and one not working, you've got a solution if you can just find it. The simplest test case is pretty obvious. I can either ftp to ftp.perl.org or I cannot. I pop two windows, one local to my Win box, one on the Unix box:
WHAT?!? But that just worked! I check to make sure nothing else has crashed in the meantime.
Right. Fine. No problem.
After the Nervous Breakdown
It looks like there is some sort of problem caused when the Unix box looks up ftp.perl.org which not only screws up the requesting machine, but also temporarily screws up every other machine that uses that DNS server. Sounds like a DNS server bug, but what is triggering it?
New test case: packet sniff the DNS traffic during each scenario to see WTF is going on. What immediately pops out is that the Unix box (the one causing the problem) has twice as many exchanges with the DNS server than the Windows box. And the first request it makes is not for the same information as the Windows box asks for.
The Unix box first asks for an AAAA record. And the nameserver barfs up a lung.
It turns out that if I force the Windows box to request the AAAA record first, its behaviour matches the Unix box. So the problem is being triggered by the Unix boxes defaulting to trying IPv6 first ... so much for progessive thinking.
I check to see if some other IPv6 sites have the same behaviour. They don't. So what's so funky about ftp.perl.org's AAAA record?
Notice anything missing? That's right, there's no actual IP address there. And ftp.cpan.ddns.develooper.com. has no AAAA record at all. Apparently this causes our particular DNS to freak out and refuse to even try to look up the domain until that entry falls out of the cache.
Always the Last to Know
Thus armed, I cast my google net and found CERT Vulnerability #714121 from 2003:
Some DNS servers respond with an inappropriate error message if queried for nonexistent AAAA records, which can lead to possible denial of service.
And indeed, pointing the systems at an external DNS (not running our out-of-date software) showed that the problem disappeared. Check and mate.
The travails of getting the corporate DNS updated are a horror story not fit for the polite company of the Monastery.
The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon