Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Comment on

by gods
on Feb 11, 2000 at 00:06 UTC ( #3333=superdoc: print w/replies, xml ) Need Help??

Why write a logfile?

When processing data programatically, writing a log of what's being done, and to which data, is extremely helpful. It gives you a blow-by-blow description of what your code's doing, verifying (or falsifying) your assumptions at every step of the process. If it's detailed enough, you can use it to recover from small disasters ("Crap! I didn't mean to delete that record!"), and it's essential in tracking down errors in large, multi-step data grinders. The small cost of keeping a text file around (which, once your program's run is complete, can easily be gzipped...) is negligible compared to the extra debugging muscle.

davorg covered some of this ground quite well in his excellent Data Munging with Perl book; I'm going to expand on the material there, probably repeat some of it, and digress a little bit into good and bad methods of "Defensive Programming".

What to log?

The quick, and usually wrong, answer is "everything!" It's great to know exactly what your code's doing during development, but in production, logging every byte of every packet you process isn't just silly, it defeats the purpose. Remember, most of your logs are going to be mind-numbingly normal, and most of the time, if you're looking at the logs you're looking for problems. Having to sort through megabytes of chaff makes your job much harder.

davorg suggests tuneable levels of logging, and I strongly concur. I'd suggest making the logging level a command-line parameter (the -v switch seems to be popular for this purpose), rather than davorg's suggested environment variable, if only because it's more visible to the user.

In general, it's probably best to log more data than less. Definitely log key data: if you're munging sendmail logs looking for mail to abuse, root, or postmaster, then log the To: address of each mail sent, and whether you processed or discarded the record.

If you're going to be counting on the logs as backups (this may make more sense than doing "real" backups if the "real" backup would entail a database dump, for instance, using much more space than a flat-text log), you will of course have to log all the data necessary to rebuild a useful system. This might mean logging everything, or you may be able to skip "nice-to-have" data that aren't strictly required for functional operation.

Log all error conditions! Log any unexpected input! Make these log entries immediately obvious! (I tend to prefix any such entries with ***ERROR or ***WARNING, depending on severity.) Check all important assumptions made about the data, and as many unimportant ones as is practical.

What not to log?

Any sensitive data. Credit card numbers, passwords, that sort of thing. This may include data that are subject to a privacy policy, like users' email addresses.

How to log

The easiest way is to just write to STDERR. There's no guarantee that anything will get saved: you'll have to redirect the output from the shell. This is sometimes useful, and often catastrophic (when you realize five minutes into a job that can't be backed out that all the output's going to the console, for instance). On the other hand, any logging's better than no logging at all, and writing to STDERR is often good enough, especially if you're just writing a quick hack.

The second-easiest way is to pull something from CPAN and use it instead of rolling your own logging code. A quick search found:

Whew! If you choose the third way (write your own logging routines), don't bother posting it to CPAN.

Log formats

Keep in mind that your logfiles should be easy for both humans and computers to read. I usually find that I identify a problem by scanning the logs by hand, then write a script to parse them, identifying each instance of a problem and fixing (or at least hiding) it programatically. Try to separate input records with an empty line, for instance. Good coding style usually makes good logging style, too; read perlstyle.

While we're on the subject, the more standard your logs can be, the better, since tools written to work on one set can be applied to another. (Suggestions?)

I don't like XML logs; I find them too difficult to read in a standard text editor (which, if I'm logging in remotely, is probably all I have).

What other resources are available?

LogReport is a general online resource for logging software, knowledge, and practices.

A note about defensive programming

Defensive programming doesn't mean "recover seamlessly from all input errors without a peep", it means "recognize the errors and do what's appropriate". If you don't know that your input's screwed, and your "defensive" program is substituting in reasonable defaults, you could have a big problem (bogus results, broken software upstream making it into production, etc) that doesn't make itself known until it's too late to fix (or at least to fix easily). On the other hand, if your input's broken in a consistent and harmless manner, you don't want kilobytes of benign errors drowning out other, more significant problems in your logs. What to do?

In general, I'd lean towards more logging, rather than less, and more fine-grained control over logging verbosity. If you know that you're going to be encountering the same sort of brain-damaged input over and over (Word-generated HTML, for instance), a simple "got bogus input foo, ignoring" the first time you see it is probably okay.

Update: added readmore tag. Thanks trs80!

Update 2002 Aug. 9: added Resources section

F o x t r o t U n i f o r m
Found a typo in this node? /msg me

In reply to Defensive Programming and Audit Trails by FoxtrotUniform

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (14)
    As of 2016-09-29 17:02 GMT
    Find Nodes?
      Voting Booth?
      Extraterrestrials haven't visited the Earth yet because:

      Results (555 votes). Check out past polls.