|No such thing as a small change|
Re: Tao Perl Ching - The Scripture of the Way of Perlby Stevie-O (Friar)
|on Dec 17, 2004 at 22:05 UTC||Need Help??|
Since nobody has replied, I'll start things off with a real-life example that fits your statements (even though it has little to do with Perl, beyond my use of it for log analysis purposes) -- perhaps others will then share their experiences:
"The harder one tries, the more resistance one will create for oneself."
I work on a small device that connects to cash registers & vending machines, and allows payment to be made via RFID keychain tags. The devices communicate to our system over the Internet. They first go through a router (called a 'gate') that serves basically as a NAT bridge between their proprietary wireless radios and a ethernet/modem uplink.
The original designers of this system had the devices communicate to this 'gate' over a hacked-up version of PPP (the changes were made to allow multiple devices to operate over a single shared link). This created a minor problem when the gate had to be rebooted, because all of the devices had to be rebooted as well (since the PPP links would be invalidated).
As time passed, things changed, and it was decided that we needed to have the devices automatically detect this event, and reboot themselves accordingly. This was accomplished by periodically sending PPP echo-request packets, and rebooting if no reply came back within a certain time threshold.
A few months later, we discovered that some of the devices would sometimes come up and be unable to communicate -- even though they could ping. I spent many days poring over hundreds of megabytes of debugging output logs, trying to determine just what was going wrong -- nothing could connect, yet the PPP echoes were being replied to, so the gate link was valid (or so I thought).
It turned out that the problem stemmed from the original designers' decision to force PPP to fit their system -- as part of the initial PPP negotiation, the devices could get confused and start sending packets to other devices (instead of the gate)! The catch is, this could only happen if one device was rebooted several minutes before another. Which did not happen until the 'automatic reboot' policy was instated.
I'd known all along that PPP was totally wrong for the design, but it was not until this day that I understood *how* wrong it was. I couldn't change PPP more to fix the problem. What was I to do?
"One whose needs are simple will find them fulfilled."
I switched to a protocol designed for a system exactly like ours: a shared broadcast medium, where everybody sees everyone else's packets -- the Ethernet protocol.
I threw DHCP clients onto the devices, a DHCP server onto the gate, and used the Ethernet portion of the IP stack to handle everything else (incl. ARP).
And in three days, I solved a problem that had plagued us for over six months.