http://www.perlmonks.org?node_id=1017042

navinc has asked for the wisdom of the Perl Monks concerning the following question:

I have about 30,000 log files which need to be prepped for ingest into a log processor. Here's what the format of the files looks like

PCName: Foo1 Command1:dfie Command2:dfo Command3:dfum PCName: Foo2 Command1:dfie Command2:dfo Command3:dfum

The log processor needs the PCName to appear as the line before each Command. Hence the output of the scripts should be

PCName: Foo1 Command1:dfie PCName: Foo1 Command2:dfo PCName: Foo1 Command3:dfum PCName: Foo2 Command1:dfie PCName: Foo2 Command2:dfo PCName: Foo2 Command3:dfum
Any ideas on how to do this in an efficient manner?

The list of commands varies from a couple to dozens for some PC's. Also for brevity I've left out the fact that the results of the commands are also present.

I've tried to use the solution put forth here http://docstore.mik.ua/orelly/perl/cookbook/ch09_11.htm to divide the file into sections by PCName and then further into sections divided by Commands and then modify the command sections to add the PCName prefix.

The files are upto 1MB max so slurping is fine. However I want to ensure that the performance holds up. Also wanted to check if there is an easier way to do this using some Perl Modules.

Thanks.

Replies are listed 'Best First'.
Re: File Find/Replace with the replacement coming from part of earlier matched string
by tmharish (Friar) on Feb 05, 2013 at 06:25 UTC

    RegEx on multiple lines that might be large with look-ahead can really slow things down.

    There is no need to hold onto chunks of anything - all you need is the PC name and you are fine up to when you see the next PC name.

    The following chunk of code does that:

    use strict ; use warnings ; my @log_data = <DATA> ; my $current_pc_name ; foreach my $log_line ( @log_data ) { chomp( $log_line ); next unless( $log_line ) ; my ( $left, $right ) = split( /\:/, $log_line ) ; if( $left eq 'PCName' ) { $current_pc_name = $right ; next ; } unless( $current_pc_name ) { die( "Command $log_line assigned to no PC!!" ) ; } print "PCName:$current_pc_name\n" ; print "$log_line\n\n" ; } __DATA__ PCName: Foo1 Command1:dfie Command2:dfo Command3:dfum PCName: Foo2 Command1:dfie Command2:dfo Command3:dfum

    OUTPUT:

    PCName: Foo1 Command1:dfie PCName: Foo1 Command2:dfo PCName: Foo1 Command3:dfum PCName: Foo2 Command1:dfie PCName: Foo2 Command2:dfo PCName: Foo2 Command3:dfum
Re: File Find/Replace with the replacement coming from part of earlier matched string
by LanX (Saint) on Feb 05, 2013 at 03:36 UTC
    What did you try?

    I would read line by line and try to match the headlines with a regex in an if statement.

    if match headline then capture headline in variable else if empty line then print empty line else print captured headline + current line

    hmm ... almost Python! :)

    Cheers Rolf

Re: File Find/Replace with the replacement coming from part of earlier matched string
by sundialsvc4 (Abbot) on Feb 05, 2013 at 14:18 UTC

    This is an absolutely classic case for a “finite-state machine (FSM)” algorithm.   (I’ll leave that for you to Google WikiPedia.)   There are four different “kinds” of input-lines:

    1. PCName: $1
    2. Command$1: $2
    3. blank line
    4. end-of-file
    ... and a suitable but arbitrary number of “states.”

    The logic of an FSM is very powerful because it moves from one “state” to another based on the current input (one of the four types listed above) and the current state.   At each state, it can call a particular subroutine or piece of logic.   This approach is also very flexible, easily modified, and it’s also clear because the logic of deciding what kind of input you have is clearly separated from the state-driven logic that does something about it.

Re: File Find/Replace with the replacement coming from part of earlier matched string
by 7stud (Deacon) on Feb 05, 2013 at 05:37 UTC
    3 commands for every pc name?
    if empty line then print empty line
    You don't want to do that.
Re: File Find/Replace with the replacement coming from part of earlier matched string
by Kenosis (Priest) on Feb 05, 2013 at 13:51 UTC

    One option is to create a directory in your log files' directory for the prepped log files, so the originals aren't altered. Given that the log files' format is as shown, consider the following which writes prepped files into a directory called prepped:

    use strict; use warnings; my $pcName; local $/ = ''; # paragraph mode for my $file (<*.log>) { open my $fhIN, '<', $file or die $!; open my $fhOUT, '>', 'prepped/' . $file or die $!; while (<$fhIN>) { chomp; print $fhOUT $_ if /\n/; # first two file lines next if /^(PCName:\s+.+)/ and $pcName = $1; # pc name & next print $fhOUT "\n\n$pcName\n$_"; # print pc name & command } close $fhIN; close $fhOUT; }

    Sample file output from your data:

    PCName: Foo1 Command1:dfie PCName: Foo1 Command2:dfo PCName: Foo1 Command3:dfum PCName: Foo2 Command1:dfie PCName: Foo2 Command2:dfo PCName: Foo2 Command3:dfum

    The fileglob operator <*.log> is used to get the list of files in the log directory; you may need to change the file extension.

    Hope this helps!