Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Not Perl. When the idea to use Perl for the project was mooted, I asked to see some examples of Perl code and they gave me three shortish (Perl 4) scripts to look at. I spent no more than 5 minutes reading them and rejected the idea out of hand as write-only code. I'm not sure if they were particularly bad examples of the Perl 4 art-form or not. I just knew that I could make some sense of the source code for most languages within a few minutes and this stuff looked like nothing I had ever seen before. The longer I stared at it, the less sense it made. Now I understand a little about how Perl works, I know that I would much, much, much rather do the project in Perl (5. I still know nothing about Perl 4), than the mish-mash of shell script (csh), awk and REXX that we ended up doing it in, but I had to make a decision, quickly, and I said no.

The project was big, 600 servers servicing 40,000 PCs, and we were in the pilot phase the project under EU Tender rules. It was critical to the project that we met the target dates and all of the agreed (and exhaustively documented) pilot phase criteria. Any failures, no matter how insignificant to the overall project goals, or understood and forgiven by the client, risked the whole project being scrapped and having to re-enter the complex tender procedure. It wasn't enough to have the client accept and sign-off any failures or omissions to prevent that. It would also have been necessary to have all our competitors, that we had beat out at the tendering stage, sign-off on those failures.

The theory being that if we won the tender on price, but then came up short of the specifications, then the tender process could be unfair. If our competitors had know that they would not have to meet certain aspects of the specification, then they might have been able to reduce their price and might therefore, have won over us. It was critical that we did not give our competitors that possibility.

The application, running on 3 of the 600 servers for the pilot, was pushing out updates to approx 1000 desktop PC located across 8 buildings with 4 to 10 floors each. The criteria called for "unattended operation" and "95% successful completion". Basically, a diskette was delivered by internal mail to each PC owner and they were instructed to power down their (LAN connected) PCs and leave the diskette in the drive A. At the alloted time (the early hours of Sunday morning usually), the Server was given a list of Ethernet Addresses and it would send a Wake-on-LAN sequence to cause them to boot the floppy, and that would initiate the upgrade. The servers would serve the updates on demand and perform extensive logging of all the details of the transactions to a central logging server.

The problem was, machines would fail to boot because the floppy was missing or corrupt; or the user had initiated shutdown, switched off the screen and gone home leaving an application asking for close or save confirmation; or a myriad of other, trivial but show stopping "human errors". The contract allowed for support personal to manually intervene to rectify such human errors, but with so many machines spread across so many floors of so many buildings; and the only allowed monitoring point, what was effectively a tail -f on the central logging server; it became almost impossible to keep track of how machines were progressing; which ones were stalled; and where those machines were located.

The suppliers of the software we were using proposed a 3-month, $6-digits solution to the problem, but we only had 10-days.

My (our) solution, was to insert tee between the logging daemon and the file, and then build an application that used pipes, sort, awk and ANSI escape sequences to display a real-time blow-by-blow "event log" of the transactions as they progressed. Sorting the display by the timestamps allowed us to place those taking the longest at the top of the display, and when one failed to respond and complete any given step in a timely manner it would filter to the top.

Armed with the knowledge of the failing machine's Ethernet address, we still had the problem of locating the physical machine--the inventory tag records were years out of date with machines have been moved; NIC cards swapped; and a myriad of other unrecorded changes to the official recorded position.

The solution to that proved to be remarkable simple. We discovered that by updating the router tables to block inbound TCP packets from the failing machines, it caused a network quality monitor DD, installed years before, to recurrently display an error message on the machine. A part of that message contained some "\a"s. It is surprising how far those simple beeps travelled in an empty office building at 2 or 3 in the morning!

We positioned a support guy every couple of floors of each building and they had only to listen for the beeps. Tracking them to source and then doing whatever it took to get that machine up and running again.

It didn't help with every failure--hung machines or failures to boot etc.--but it allowed us to find and fix enough of the failures to comply with the 95% rules without incurring a huge and untimely re-development cost; or breaking the non-interactivity clauses of the unattended operations missive.

The whole thing was a shell pipeline of logapp | tee logfile | sort ... | awk. An amazingly simple and powerful concept.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Perl and Pipes. Share your story. by BrowserUk
in thread Perl and Pipes. Share your story. by DigitalKitty

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-03-29 01:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found