Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Sorry, no code here, just a description. The code is propritary, and very specific to the problem. It does not make much sense without knowing the product and its source code.


After a large clinical test of one of our products, we got back about 1800 log files. The task: make sense of those logs. What were the product's problems, what errors that should never happen did happen, how did the users (ab-)use the product, short: How can we improve the product?

The log files are strictly line-oriented, roughly similar to logfiles generated by the Apache webserver. Each line starts with a time stamp, followed by a log source (which component caused the log line), followed by a log message. The three parts are separated by spaces. Log messages from one component are almost always further divided into a named log level, a source code reference (source file and line), a log message from a set of about 200 fixed messages, and some variable text. Other components also use fixed messages, but variable data might be mixed in (think of printf("Foo is %.2f cm, bar is %.2f N\n",foo,bar)).


Perl to the rescue:

A first script extracts the fixed log messages from the source code, and generates a perl module that contains the fixed messages and a function to mark the fixed message in a log line.

A second script uses the module generated by the first script, DBI, DBD::SQLite, and File::Find to read all log files, extract the device serial number from the log files' path, splits each line into timestamp, source, and raw message. The raw message is then handled differently for the different sources, to extract fixed messages and variable parts, log level, and source reference, if present. All of the data is then inserted into an SQLite database containing a single, not normalised table, using columns for serial number, file name, line number, timestamp, source, raw message, fixed message, variable part, log level, source reference.

A third script uses DBI, DBD::SQLite, and Text::CSV_XS reads a list of named SQL SELECT statements from a text file (example below), then creates a CSV file for each named SELECT statement, runs the query against the SQLite database, and writes the data fetched into the CSV file. Existing CSV files won't be overwritten.

Exampe for the query file (from memory):

-- Comments start with "--", "#", or ";". -- Section names are used as file names. [logfiles-per-device] SELECT serial, COUNT(*) FROM ( SELECT DISTINCT serial, filename FROM logs ) GROUP BY serial ORDER BY serial ; [errors-per-device] SELECT serial, fixedmsg, COUNT(*) FROM logs WHERE loglevel='Error' GROUP BY serial, fixedmsg ORDER BY fixedmsg, serial ; -- -- and so on

The workflow is quite simple. Scripts 1 and 2 are executed once to create the database, then someone edits the query file and runs the third script. CSV files are opened in Excel. We make some educated guesses about what happened, add a few new queries to the query file, run the third script again, and look at the new output, again in Excel. Wash, rinse, repeat.


Update:

Why CSV and not Excel? I could have used Spreadsheet::WriteExcel or Excel::Writer::XLSX to generate native Excel files. But we may need to draw some graphs from the log data, and gnuplot needs CSV input. Plus, writing CSV files from Text::CSV_XS is much easier than writing Excel files.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

In reply to How to quickly make sense of 1800 log files by afoken

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (6)
As of 2024-03-28 23:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found