Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

P::RD and grammar

by TStanley (Canon)
on Nov 12, 2013 at 20:19 UTC ( #1062264=perlquestion: print w/replies, xml ) Need Help??
TStanley has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a log file that comes from a price checker scanner, looking specifically for scanning transactions. A typical transaction is below:

Oct 31, 2013 10:40:05 AM PCKLog log INFO: recv - <2>747599306525<3><5><4> Oct 31, 2013 10:40:05 AM PCKLog log INFO: connect PLU server =, PLU port = 31415 Oct 31, 2013 10:40:05 AM PCKLog log INFO: connected..... Oct 31, 2013 10:40:05 AM PCKLog log INFO: PLU send - 12 bytes Oct 31, 2013 10:40:05 AM PCKLog log INFO: PLU send - 747599306525 Oct 31, 2013 10:40:06 AM PCKLog log INFO: PLU recv - 124 bytes Oct 31, 2013 10:40:06 AM PCKLog log INFO: PLU recv - <?xml version="1.0" encoding="utf-8"?><PLU><desc>GHIR +ARDELLI MINT</desc><dept>110</dept><prc1>600</prc1><deal>2</deal></PL +U> Oct 31, 2013 10:40:06 AM PCKLog log INFO: disconnected..... Oct 31, 2013 10:40:06 AM PCKLog log INFO: send - <2>\x0B\x1B[1F\x1B[08;08TMBheader\x1B[2002F +\x1B[000;24CGHIRARDELLI MINT\x1B[6F\x1B[000;36C2/$6.00<3>8<4> Oct 31, 2013 10:40:11 AM PCKLog log INFO: send - <2>\x0B\x1B[1F\x1B[008;08T\x1B[1002J<3>8<4>
Using Parse::RecDescent, I have developed the following code:
#!C:\Perl\bin\perl use strict; use warnings; use Parse::RecDescent; use Data::Dumper; BEGIN{ $::RD_AUTOACTION=q{ [@item[1..$#item]] }; } my $grammar = q{ transaction: date|scan|date|connectPLU|date|connected|date| +sending|date|sent|date|receiving|date|received|date|disconnected|date +|display1|date|display2 date: /([Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep +|Oct|Nov|Dec]\\\\s+\\\\d{1,2}, \\\\d{4}) (\\\\d{1,2}:\\\\d{1,2}):\\\\ +d{1,2} ([AM|PM]) PCKLog log/ { print"$item[0]: $item[1] - $item[2]$it +em[3]\\\\n"; } | scan: /INFO: (\\\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d +{1,3}): recv - \\<\\d\\>(\\d+)\\<\\d\\>\\<\\d\\>\\<\\d\\>/ | connectPLU: /INFO: connect PLU server = \\d{1,3}\\ +.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}, PLU port = \\d+/ | connected: /INFO: connected\\.\\.\\.\\.\\./ | sending: /INFO: PLU send - \\d{1,2} bytes/ | sent: /INFO: PLU send - (\\d+)/ | receiving: /INFO: PLU recv - \\d{1,3} bytes/ | received: /INFO: PLU recv - \\<\\?xml .*\\?\\>\\ +<PLU\\>\\<desc\\>(.*)\\<\\/desc\\>\\<dept\\>\\d{1,4}\\<\\/dept\\>\\<p +rc1\\>(\\d{1,4})\\<\\/prc1\\>\\<deal\\>\\d{1,3}\\<\\/deal\\>\\<\\/PLU +\\>/ | disconnected: /INFO: disconnected\\.\\.\\.\\.\\./ | display1: /INFO: (\\d{1,3}\\.\\d{1,3}\\.\\d{1,3} +\\.\\d{1,3}): send - .*/ | display2: /INFO: (\\d{1,3}\\.\\d{1,3}\\.\\d{1,3} +\\.\\d{1,3}): send - .*/ }; my $parser = new Parse::RecDescent($grammar) or die "Bad grammar: $!\\ +n"; my($INFILE,$storelog); my @log; $storelog = "SingleTrans.txt"; open $INFILE,"<",$storelog or die "Can't open $storelog: $!\\n"; @log=<$INFILE>; close $INFILE; my $tree=$parser->transaction(@log); print Dumper($tree);

What I am getting with the above code is just the word 'date' printed out when I dump the $tree variable. I went through most of the resources here on PM, as well as some other stuff I found on the web.

What I need to pull from the scan log is the date/time the transaction is occuring, the IP address of the scanner that is making the request, the PLU that is being sent and the response back, which is the product description, so the information based on the above listed transaction would be:

Date: Oct 31, 2013 - 10:40 Scanner: PLU: 747599306525 Description: GHIRARDELLI MINT

Any help would be greatly appreciated as always.

People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Replies are listed 'Best First'.
Re: P::RD and grammar (Parse::RecDescent)
by toolic (Bishop) on Nov 12, 2013 at 21:03 UTC
    I've never used it before, but it looks fun! I think you should start small, then build up your grammar. I got rid of the square brackets around you month names. They are for regex character classes, if I understand this module correctly:
    my $grammar = q{ transaction: date date: /(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\ +s+\d{1,2},/ }; # My output... # $VAR1 = 'Oct 31,';

    UPDATE: and a little more readable:

    my $grammar = q{ transaction: date date: /( (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|D +ec) # Month \s+ \d{1,2} , \s+ \d{4} + # Day, Year ) \s+ (\d{1,2} : \d{1,2}) : \d{1,2} \s+ (A|P)M + # Time \s+ PCKLog \s+ log /x };

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1062264]
Approved by kcott
Front-paged by toolic
[Corion]: Whee! Germanys Grand Central Airport (for Berlin, re-scheduled for the thrid or fourth time, to open now in 2018) has just fired their technical lead
[ambrus]: Corion: Is their budget estimate four times the initial estimate yet? Was the construction late by six months already three months after they started?
[marto]: I'll believe it when I see it. That's been due for many years now :P
[marto]: oddly the last time I was there I saw a zeppelin in flight for the first time
[Corion]: ambrus: I don't know how far the budget estimate has been overrun yet
[Corion]: marto: Heh ;) No, you were likely at Tegel or Schönefeld, the new one ("BER") will be somewhat outside of Berlin ;)
[marto]: Corion Schoenfeld, but my friend has been telling me about this new airport for about 7-8 years now :P
[marto]: I doubt it even exists as a building site :P

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2017-02-23 11:58 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (346 votes). Check out past polls.