Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

(jeffa) Re: split delimiters II

by jeffa (Bishop)
on Mar 27, 2001 at 02:57 UTC ( [id://67370]=note: print w/replies, xml ) Need Help??


in reply to split delimiters II

Unless you post a complete example of the line to be parsed, we cannot help you. Since I am a nice guy, though, I will explain why you need more than split for this problem.

The better solution would be to use the m// operator and the grouping variables: $1, $2, etc. I'll explain by parsing an entry from an Apache web server access log

$line = '127.0.0.1 - - [26/Mar/2001:16:01:07 -0500] "GET /stuff/ HTTP/ +1.0" 200 11874'
Each entry is seperated by dashes, brackets, or quotes - but since we know the general layout, we can write a regualar expression that is general enough to parse each line, but specific enough to get the data we want - just the IP of the referrer, the date stamp and the requested document ( along with request type)
use strict; my ($ip,$date,$method,$file,$header,$status,$pid) = $line =~ /^([\d.]+) # $id = ip quad \s*-\s*-\s* # skip over these \[(.*?)\]\s" # $date = everything between the brackets (\w+)\s* # $method = the method, usually GET or POST ([^\s]+)\s* # $file = everything UP TO the next white space (.*)"\s* # $header = everything UP TO the next double quote (\d+)\s* # $status = digits between spaces (\d+)\s*$/x; # $pid = last set of digits print "$ip\n$date\n$method\n$file\n$status\n$pid\n";
By no means I am an master of regular expressions, the ones I chose just happen to work - there are better ways then using .* - but a little badness won't kill ya' :)

Big Thanks to Albannach.

Jeff

R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://67370]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2025-07-18 07:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.