Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Regex Help - Large regex example, and larger Parse::RecDescent attempt

by imp (Priest)
on Dec 22, 2006 at 06:39 UTC ( #591270=note: print w/replies, xml ) Need Help??

in reply to Regex Help pulling Data from a string

The appropriate solution to this problem depends on how precise the pattern matching needs to be. How much post-extraction processing you are willing to do matters as well, e.g. do you need '58bn5904' or are you content with 'd:\data\58bn5904.dat'.

To give you an idea of how ugly the regex could become:

use strict; use warnings; # Example line: # e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) S +ent file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 byte +s) # Desired: # beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $re_date = qr< (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat) \s \d{1,2} # Day of month (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) # Month \d{2} # Two digit year \s \d{2}:\d{2}:\d{2} >x; my $pattern = qr< e:\\logfiles\\(.*?) # Capture(1) filename \s \[\d+\] # Bracketed number \s ($re_date) # Capture(2) date \s - \s \(\d+\) # number in parens \s Sent \s file \s d:\\data\\(.*?)\.dat # Capture(3) file basename \s successfully \s \( [0-9.]+ \s [A-Z]b /sec [ ] - [ ] (\d+ \s bytes) # Capture(4) bytes text \) >x; while (my $line = <DATA>) { if ($line =~ /$pattern/) { my ($logfile, $date, $file_basename, $bytes) = ($1,$2,$3,$4); printf "(%s) (%s) (%s) (%s)\n", $logfile,$date,$file_basename, + $bytes; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 bytes)
I have been meaning to learn Parse::RecDescent for ages, so tonight I took some time to try and solve your problem with it. It is likely the wrong tool for this job, and definitely a poor implementation - I would welcome any feedback for people with stronger parse-fu.
use strict; use warnings; use Parse::RecDescent; $::RD_HINT=5; my $grammar = <<'GRAMMAR'; { use strict; use warnings; } logfile : 'e:\\logfiles\\' /[-A-Za-z0-9_.]+/ { $item[2] } date : m{ (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) \s \d\d (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d }x time : /\d{2}:\d{2}:\d{2}/ sentfile: <skip:''> 'd:\\data\\' /[-A-Za-z0-9_]+/ '.dat' { $item[3] } rate : /\d+\.\d [A-Za-z]+\/sec/ bytecount : /\d+ bytes/ parse : logfile /\[\d+\]/ date time /- \(\d+\) Sent file / sentfile <skip:'[- \t()]*'> ( /successfully/ rate ) bytecount { [ @item{qw(logfile date time sentfile bytecount)}] } GRAMMAR # Expect: beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $parser = Parse::RecDescent->new($grammar); use Data::Dumper; while (my $line = <DATA>) { last unless $line =~ /\S/; my @fields = $parser->parse($line); if (@fields) { print Dumper \@fields; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/se
$VAR1 = [ [ 'beardstownbase.log', 'Thu 22Jun06', '08:07:19', '58bn5904', '859216 bytes' ] ];

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://591270]
[Your Mother]: I like this one quite well, https://www. v=rSEUH4KRfN8
[choroba]: Půlnoční Marie (which means Midnight Mary)
[choroba]: but the band is practically dead. We rehearse once in a year, and perform with the same frequency
[ambrus]: `quote
[ambrus]: sorry, typed in the wrong box
[LanX]: Your mother: je ne t'aime plus! ;)
[choroba]: seems lispy
[Your Mother]: Be nice! I speak that much French. Though I also like that song.
[choroba]: So, I joined Pod Černý Vrch some months ago to still have the fun. But I wouldn't call it "punk" anymore...
[LanX]: choroba deaf after one rehearsal per year ?

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2017-03-24 12:28 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (301 votes). Check out past polls.