File Find/Replace with the replacement coming from part of earlier matched string

navinc has asked for the wisdom of the Perl Monks concerning the following question:

I have about 30,000 log files which need to be prepped for ingest into a log processor. Here's what the format of the files looks like

PCName: Foo1
Command1:dfie

Command2:dfo

Command3:dfum

PCName: Foo2

Command1:dfie

Command2:dfo

Command3:dfum
[download]

The log processor needs the PCName to appear as the line before each Command. Hence the output of the scripts should be

PCName: Foo1
Command1:dfie

PCName: Foo1
Command2:dfo

PCName: Foo1
Command3:dfum

PCName: Foo2
Command1:dfie

PCName: Foo2
Command2:dfo

PCName: Foo2
Command3:dfum
[download]

Any ideas on how to do this in an efficient manner?

The list of commands varies from a couple to dozens for some PC's. Also for brevity I've left out the fact that the results of the commands are also present.

I've tried to use the solution put forth here http://docstore.mik.ua/orelly/perl/cookbook/ch09_11.htm to divide the file into sections by PCName and then further into sections divided by Commands and then modify the command sections to add the PCName prefix.

The files are upto 1MB max so slurping is fine. However I want to ensure that the performance holds up. Also wanted to check if there is an easier way to do this using some Perl Modules.

Thanks.

Comment on File Find/Replace with the replacement coming from part of earlier matched string Select or Download Code

Replies are listed 'Best First'.
Re: File Find/Replace with the replacement coming from part of earlier matched string by tmharish (Friar) on Feb 05, 2013 at 06:25 UTC
RegEx on multiple lines that might be large with look-ahead can really slow things down. There is no need to hold onto chunks of anything - all you need is the PC name and you are fine up to when you see the next PC name. The following chunk of code does that: use strict ; use warnings ; my @log_data = <DATA> ; my $current_pc_name ; foreach my $log_line ( @log_data ) { chomp( $log_line ); next unless( $log_line ) ; my ( $left, $right ) = split( /\:/, $log_line ) ; if( $left eq 'PCName' ) { $current_pc_name = $right ; next ; } unless( $current_pc_name ) { die( "Command $log_line assigned to no PC!!" ) ; } print "PCName:$current_pc_name\n" ; print "$log_line\n\n" ; } __DATA__ PCName: Foo1 Command1:dfie Command2:dfo Command3:dfum PCName: Foo2 Command1:dfie Command2:dfo Command3:dfum [download] OUTPUT: `PCName: Foo1 Command1:dfie PCName: Foo1 Command2:dfo PCName: Foo1 Command3:dfum PCName: Foo2 Command1:dfie PCName: Foo2 Command2:dfo PCName: Foo2 Command3:dfum` [download]	[reply] [d/l] [select]
Re: File Find/Replace with the replacement coming from part of earlier matched string by LanX (Saint) on Feb 05, 2013 at 03:36 UTC
What did you try? I would read line by line and try to match the headlines with a regex in an if statement. `if match headline then capture headline in variable else if empty line then print empty line else print captured headline + current line` [download] hmm ... almost Python! :) Cheers Rolf	[reply] [d/l]
Re: File Find/Replace with the replacement coming from part of earlier matched string by sundialsvc4 (Abbot) on Feb 05, 2013 at 14:18 UTC
This is an absolutely classic case for a “finite-state machine (FSM)” algorithm. (I’ll leave that for you to ~~Google~~ WikiPedia.) There are four different “kinds” of input-lines: `PCName: $1` `Command$1: $2` blank line end-of-file ... and a suitable but arbitrary number of “states.” The logic of an FSM is very powerful because it moves from one “state” to another based on the current input (one of the four types listed above) and the current state. At each state, it can call a particular subroutine or piece of logic. This approach is also very flexible, easily modified, and it’s also clear because the logic of deciding what kind of input you have is clearly separated from the state-driven logic that does something about it.
Re: File Find/Replace with the replacement coming from part of earlier matched string by 7stud (Deacon) on Feb 05, 2013 at 05:37 UTC
3 commands for every pc name? `if empty line then print empty line` [download] You don't want to do that.	[reply] [d/l]
Re: File Find/Replace with the replacement coming from part of earlier matched string by Kenosis (Priest) on Feb 05, 2013 at 13:51 UTC
One option is to create a directory in your log files' directory for the prepped log files, so the originals aren't altered. Given that the log files' format is as shown, consider the following which writes prepped files into a directory called `prepped`: `use strict; use warnings; my $pcName; local $/ = ''; # paragraph mode for my $file (<.log>) { open my $fhIN, '<', $file or die $!; open my $fhOUT, '>', 'prepped/' . $file or die $!; while (<$fhIN>) { chomp; print $fhOUT $_ if /\n/; # first two file lines next if /^(PCName:\s+.+)/ and $pcName = $1; # pc name & next print $fhOUT "\n\n$pcName\n$_"; # print pc name & command } close $fhIN; close $fhOUT; }` [download] Sample file output from your data: `PCName: Foo1 Command1:dfie PCName: Foo1 Command2:dfo PCName: Foo1 Command3:dfum PCName: Foo2 Command1:dfie PCName: Foo2 Command2:dfo PCName: Foo2 Command3:dfum` [download] The fileglob operator `<.log>` is used to get the list of files in the log directory; you may need to change the file extension. Hope this helps!	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom