Re: Not able to capture information

This idea didn't work out as well as I thought it would, but I will post for entertainment value. There are a lot of ways to skin these cats...

#!/usr/bin/perl -w
use strict;

my @data = do{local $/ = "\n["; (<DATA>)};
 
@data = map{ s/\n/ /g; s/\[//g; s/\]/ ==/g;  $_}@data;

print join "\n", @data;

=prints
2012/02/16 00:08:34 == 29 == ERRORMSG unknown error Can't insert into 
+price table Please check Valueprice.pm line 52. 
2012/02/16 00:08:34 == 39 == ERRORMSG Invalid User 
2012/02/16 00:14:52 == 105 == ERRORMSG missing conversion rate 
2012/02/16 00:14:52 == 29 == ERRORMSG Can't use an undefined value as 
+a HASH reference at Value.pm line 77. 
=cut

__DATA__
[2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr
+ice table
Please check
Valueprice.pm line 52.
[2012/02/16 00:08:34] [39] ERRORMSG Invalid User
[2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate
[2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a 
+HASH reference at Value.pm line 77.
[download]

Update:
I suppose the first two little regex's in the map could be replaced with a single tr
@data = map{ tr/\n[/ /d; s/\]/ ==/g; $_}@data;
tr is faster than regex because it is "lighter weight" meaning "dumber". It cannot substitute one character into two. But in this case performance appears not to be a significant factor - or at least that is not mentioned in the requirements.

My personal advice on parsing very regular program generated things like log files is to keep the regex complexity as low as possible - make it just as complicated as it needs to be and no more. If you are validating "user input" then the complexity level has to be more.

Comment on Re: Not able to capture information Select or Download Code

Replies are listed 'Best First'.
Re^2: Not able to capture information by oko1 (Deacon) on Feb 17, 2012 at 07:52 UTC
My 2 cents on your 2 cents: validating user input is very simple. Never try to "enumerate badness"; just define what is valid and reject everything else. `my $in; { print "Input 'foo': "; chomp($in=<STDIN>); redo unless /^foo$/; }` [download] -- I hate storms, but calms undermine my spirits. -- Bernard Moitessier, "The Long Way"	[reply] [d/l]
Re^3: Not able to capture information by Marshall (Canon) on Feb 17, 2012 at 09:02 UTC
I don't think that we need to get into a big discussion in the context of this thread. Part of what I'm saying is that with: `[2012/02/16 00:08:34] [29] ERRORMSG unknown error` There is no reason or need to parse the date time format with some huge regex eg: `m/\[(\d{4}\/\d{2}\/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[(\d{1,3})\]` If the line begins with "[" it is a date/time and there is no reason to parse or otherwise try to understand it. Maybe this changes to YYYY-MM-DD or YYYY.MM.DD instead of YYYY/MM/DD? In the context of this re-formatting program, it shouldn't matter. Basically, if a complex regex is not essential to the program operation, don't even do that. Here all that is needed is to understand that the square brackets on the first part of a line signifies a "new record". Past that, the parser shouldn't care about the format between the square brackets, because it doesn't need to do that in order to do its job! Maybe we are actually in agreement here? `^[...]` starts a new "message line" and that is all we need to know - that is considered "valid input" no matter what is between the `[...]`.	[reply] [d/l] [select]


Do you know where your variables are?
	PerlMonks