How to quickly make sense of 1800 log files

Sorry, no code here, just a description. The code is propritary, and very specific to the problem. It does not make much sense without knowing the product and its source code.

After a large clinical test of one of our products, we got back about 1800 log files. The task: make sense of those logs. What were the product's problems, what errors that should never happen did happen, how did the users (ab-)use the product, short: How can we improve the product?

The log files are strictly line-oriented, roughly similar to logfiles generated by the Apache webserver. Each line starts with a time stamp, followed by a log source (which component caused the log line), followed by a log message. The three parts are separated by spaces. Log messages from one component are almost always further divided into a named log level, a source code reference (source file and line), a log message from a set of about 200 fixed messages, and some variable text. Other components also use fixed messages, but variable data might be mixed in (think of printf("Foo is %.2f cm, bar is %.2f N\n",foo,bar)).

Perl to the rescue:

A first script extracts the fixed log messages from the source code, and generates a perl module that contains the fixed messages and a function to mark the fixed message in a log line.

A second script uses the module generated by the first script, DBI, DBD::SQLite, and File::Find to read all log files, extract the device serial number from the log files' path, splits each line into timestamp, source, and raw message. The raw message is then handled differently for the different sources, to extract fixed messages and variable parts, log level, and source reference, if present. All of the data is then inserted into an SQLite database containing a single, not normalised table, using columns for serial number, file name, line number, timestamp, source, raw message, fixed message, variable part, log level, source reference.

A third script uses DBI, DBD::SQLite, and Text::CSV_XS reads a list of named SQL SELECT statements from a text file (example below), then creates a CSV file for each named SELECT statement, runs the query against the SQLite database, and writes the data fetched into the CSV file. Existing CSV files won't be overwritten.

Exampe for the query file (from memory):

-- Comments start with "--", "#", or ";".
-- Section names are used as file names.
[logfiles-per-device]
SELECT
    serial,
    COUNT(*)
FROM
    (
        SELECT DISTINCT
            serial,
            filename
        FROM
            logs
    )
GROUP BY
    serial
ORDER BY
    serial
;
[errors-per-device]
SELECT
    serial,
    fixedmsg,
    COUNT(*)
FROM
    logs
WHERE
    loglevel='Error'
GROUP BY
    serial,
    fixedmsg
ORDER BY
    fixedmsg,
    serial
;
--
-- and so on
[download]

The workflow is quite simple. Scripts 1 and 2 are executed once to create the database, then someone edits the query file and runs the third script. CSV files are opened in Excel. We make some educated guesses about what happened, add a few new queries to the query file, run the third script again, and look at the new output, again in Excel. Wash, rinse, repeat.

Update:

Why CSV and not Excel? I could have used Spreadsheet::WriteExcel or Excel::Writer::XLSX to generate native Excel files. But we may need to draw some graphs from the log data, and gnuplot needs CSV input. Plus, writing CSV files from Text::CSV_XS is much easier than writing Excel files.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Comment on How to quickly make sense of 1800 log files Select or Download Code

Replies are listed 'Best First'.

Re: How to quickly make sense of 1800 log files
by huck (Prior) on Apr 08, 2017 at 11:43 UTC

Ive enjoyed doing projects like this, I did most of them in SAS tho. These early periods i thought of as "going fishing". You are going out not after a certain fish but just to see what you can catch.

One thing i would always start with was something often called a crosstabs but i would use SUMMARY for it. Let assume you index all the fixed messages giving each a number just to simplify the resultant table. You then create a series of metrics, what percent of the time msg1 exists, msg2 exists, etc. Then you start crossing them, How many times did msg1 exist with msg2, etc. And you create a new statistic, the expected cell percent. If msg1 is 10% and msg2 is 10% you would then expect the cell occurrence to be 1% (.10*.10)=.01. Create this for all the crosses of msgN*msgM. IF you want to optimize this will be a triangular matrix.

Now you need a visualization method. Just printing the table is useful, you can create bands that hi-lite odd cell frequencies for instance.

But my favorite tool for this was plain old PROC CHART, that just developed simple barcharts, in this case 2 way barcharts. I would express the bands in a different character, for 3 bands close to expected might be a *, way above expected might be a + and way below expected a -. For each row msgN, a bar chart of all the msgMs found with it gets its own little bar chart. This becomes an output you can quickly scan and find interesting combinations.

Once this process tool is created you will find it useful for other crosses, fixed by variable parts, etc. You can parametrize this tool, first selecting for having some specific message, or variable range before creating the crosstabs. in effect creating n-way crosstabs. Next you need a way to quickly enter the parameters and view the output. Back then XEDIT was my weapon of choice, I had tools that allowed me to make CICS like form entry screens in XEDIT to fill in the blanks and a function key would run the program (localy under VM/CMS or via submit to MVS. and create the chart, these days id go with html forms/display but any gui is up to this task, even a cmdline version works fine, its like a lot of IDEs

My favorite database was everything 25000 households bought over a two week period, from gum to houses, with other demographic info like region, income bands, etc. After getting the data read in i spent a night playing with the tool, partially to insure i had gotten the data read in correctly, and partially just to "go fishing". In talking with the primary researcher the next day, some of my findings were expected, like middle income people dont eat as much vegies as high/low income people, for high income they like to eat good food, low income are optimizing where they spend their money and vegies are cheap. But middle income people like quick to access junk food. In-house it became known as the junk-food project after that.

The same generalized tool becomes useful for many database types: stock/futures indicators, patient/drug interactions/effects, inventory patterns, CICS transactions, investments, ... The idea is a quick to use tool to show outliers that lead you to further in depth analysis. Something your users can use easily to deep-dive masses of data. Id also keep a history of the runs, so someone could look at what they ran last tuesday without having to remember all the parms that went with the image they remember, they could mark the run on a list to just rerun it again

perl had yet to be born at the time of the junk-food project. SQL was just a baby(bairly past System-R) , IMS was still the king of databases and it came with data-admins and red tape. SAS had the tools, power, a user definable database structure to fit the bill, and ran on VM/CMS and MVS without needing to get change-control involved. Now with easy access to simple to define sqlite databases, perl modules, and html as an input/output option they make perfect sense for this kind of project.

Have fun fishing

[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks