Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Regex Help - Large regex example, and larger Parse::RecDescent attempt

by imp (Priest)
on Dec 22, 2006 at 06:39 UTC ( #591270=note: print w/ replies, xml ) Need Help??


in reply to Regex Help pulling Data from a string

The appropriate solution to this problem depends on how precise the pattern matching needs to be. How much post-extraction processing you are willing to do matters as well, e.g. do you need '58bn5904' or are you content with 'd:\data\58bn5904.dat'.

To give you an idea of how ugly the regex could become:

use strict; use warnings; # Example line: # e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) S +ent file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 byte +s) # Desired: # beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $re_date = qr< (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat) \s \d{1,2} # Day of month (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) # Month \d{2} # Two digit year \s \d{2}:\d{2}:\d{2} >x; my $pattern = qr< e:\\logfiles\\(.*?) # Capture(1) filename \s \[\d+\] # Bracketed number \s ($re_date) # Capture(2) date \s - \s \(\d+\) # number in parens \s Sent \s file \s d:\\data\\(.*?)\.dat # Capture(3) file basename \s successfully \s \( [0-9.]+ \s [A-Z]b /sec [ ] - [ ] (\d+ \s bytes) # Capture(4) bytes text \) >x; while (my $line = <DATA>) { if ($line =~ /$pattern/) { my ($logfile, $date, $file_basename, $bytes) = ($1,$2,$3,$4); printf "(%s) (%s) (%s) (%s)\n", $logfile,$date,$file_basename, + $bytes; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 bytes)
I have been meaning to learn Parse::RecDescent for ages, so tonight I took some time to try and solve your problem with it. It is likely the wrong tool for this job, and definitely a poor implementation - I would welcome any feedback for people with stronger parse-fu.
use strict; use warnings; use Parse::RecDescent; $::RD_HINT=5; my $grammar = <<'GRAMMAR'; { use strict; use warnings; } logfile : 'e:\\logfiles\\' /[-A-Za-z0-9_.]+/ { $item[2] } date : m{ (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) \s \d\d (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d }x time : /\d{2}:\d{2}:\d{2}/ sentfile: <skip:''> 'd:\\data\\' /[-A-Za-z0-9_]+/ '.dat' { $item[3] } rate : /\d+\.\d [A-Za-z]+\/sec/ bytecount : /\d+ bytes/ parse : logfile /\[\d+\]/ date time /- \(\d+\) Sent file / sentfile <skip:'[- \t()]*'> ( /successfully/ rate ) bytecount { [ @item{qw(logfile date time sentfile bytecount)}] } GRAMMAR # Expect: beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $parser = Parse::RecDescent->new($grammar); use Data::Dumper; while (my $line = <DATA>) { last unless $line =~ /\S/; my @fields = $parser->parse($line); if (@fields) { print Dumper \@fields; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/se
Output:
$VAR1 = [ [ 'beardstownbase.log', 'Thu 22Jun06', '08:07:19', '58bn5904', '859216 bytes' ] ];


Comment on Re: Regex Help - Large regex example, and larger Parse::RecDescent attempt
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://591270]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (11)
As of 2015-07-06 09:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (71 votes), past polls