Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Not able to capture information

by Marshall (Canon)
on Feb 17, 2012 at 06:51 UTC ( [id://954418]=note: print w/replies, xml ) Need Help??


in reply to Not able to capture information

This idea didn't work out as well as I thought it would, but I will post for entertainment value. There are a lot of ways to skin these cats...
#!/usr/bin/perl -w use strict; my @data = do{local $/ = "\n["; (<DATA>)}; @data = map{ s/\n/ /g; s/\[//g; s/\]/ ==/g; $_}@data; print join "\n", @data; =prints 2012/02/16 00:08:34 == 29 == ERRORMSG unknown error Can't insert into +price table Please check Valueprice.pm line 52. 2012/02/16 00:08:34 == 39 == ERRORMSG Invalid User 2012/02/16 00:14:52 == 105 == ERRORMSG missing conversion rate 2012/02/16 00:14:52 == 29 == ERRORMSG Can't use an undefined value as +a HASH reference at Value.pm line 77. =cut __DATA__ [2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr +ice table Please check Valueprice.pm line 52. [2012/02/16 00:08:34] [39] ERRORMSG Invalid User [2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate [2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a +HASH reference at Value.pm line 77.
Update:
I suppose the first two little regex's in the map could be replaced with a single tr
@data = map{ tr/\n[/ /d;  s/\]/ ==/g;  $_}@data;
tr is faster than regex because it is "lighter weight" meaning "dumber". It cannot substitute one character into two. But in this case performance appears not to be a significant factor - or at least that is not mentioned in the requirements.

My personal advice on parsing very regular program generated things like log files is to keep the regex complexity as low as possible - make it just as complicated as it needs to be and no more. If you are validating "user input" then the complexity level has to be more.

Replies are listed 'Best First'.
Re^2: Not able to capture information
by oko1 (Deacon) on Feb 17, 2012 at 07:52 UTC

    My 2 cents on your 2 cents: validating user input is very simple. Never try to "enumerate badness"; just define what is valid and reject everything else.

    my $in; { print "Input 'foo': "; chomp($in=<STDIN>); redo unless /^foo$/; }
    -- 
    I hate storms, but calms undermine my spirits.
     -- Bernard Moitessier, "The Long Way"
      I don't think that we need to get into a big discussion in the context of this thread.

      Part of what I'm saying is that with:
      [2012/02/16 00:08:34] [29] ERRORMSG unknown error

      There is no reason or need to parse the date time format with some huge regex eg:
       m/\[(\d{4}\/\d{2}\/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[(\d{1,3})\]

      If the line begins with "[" it is a date/time and there is no reason to parse or otherwise try to understand it. Maybe this changes to YYYY-MM-DD or YYYY.MM.DD instead of YYYY/MM/DD? In the context of this re-formatting program, it shouldn't matter.

      Basically, if a complex regex is not essential to the program operation, don't even do that. Here all that is needed is to understand that the square brackets on the first part of a line signifies a "new record". Past that, the parser shouldn't care about the format between the square brackets, because it doesn't need to do that in order to do its job!

      Maybe we are actually in agreement here?
      ^[...] starts a new "message line" and that is all we need to know - that is considered "valid input" no matter what is between the [...].

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://954418]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-26 00:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found