Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Re: pulling by regex

by mkent (Acolyte)
on Dec 13, 2002 at 00:02 UTC ( [id://219469]=note: print w/replies, xml ) Need Help??


in reply to Re: pulling by regex
in thread pulling by regex

Hey, guys, thanks!!! This is a wonderful resource, and I incorporated some suggestions into the revised script below. I still have some questions, though!

BrowserUk, I decided against using Date:Manip even though I really like that module. That's because the module instructions warn that it's slower than other time modules and this script will be used most often when the web server is overloaded with requests; thus, speed is essential.

Abigail-II, a database would be nice, but the server is producing regular logs, so that's what I have to use.

In the following script, here are my questions:

1) Using strict produces errors that I don't have a global module loaded; what module is that?

2) The simulated $month switch statement doesn't work as expected; instead of values 0 through 11, it gives everything a value of 1. Getting it changed to a number makes timelocal accurate.

3. At the end, I pack all the referrers into an array; what I need to do is count each referrer as an unique URL, so that www.you.com is counted x times and www.me.com is counted y times so I can then tell the top referrer in the time period stipulated by the web page (which just has hours and minutes to enter). That will let me create output like
www.you.com 22
www.me.com 19
etc
How can I count an unknown value and produce this output? And is an array the best way to do it?

Any and all ideas welcome, and thanks in advance. I really appreciate the help!

Here's the script, followed by some raw log data:

#!/usr/local/bin/perl #use strict; use CGI qw(:standard); use CGI::Carp qw(fatalsToBrowser carpout); use Time::Local; # Grab information returned by web page $hour = param ("hour"); $minute = param ("minute"); # Allow perl to write to browser window print "Content-type: text/html\n\n"; # Current time in seconds $now = time; # Convert submitted time to seconds $compare_time = ($hour * 3600) + ($minute * 60); # Times extracted by logs must be >= to $target $target = $now - $compare_time; open LOGFILE, "datafile.html" || die "Can't open file"; @log_data =<LOGFILE>; # Grab useful information from each line of the web log foreach $log_line(@log_data) { # Grab date/time and referer ($date_string, $referrer) = ($log_line =~ /\[([^\]]+)\] "[^"]+"[^"] ++"([^"]+)"/); # Replace / and : with spaces $date_string =~ s!/! !g; $date_string =~ s!:! !g; # Dump junk at end of line $date_string =~ s! -[0-9]+!!; # Split date/time into useful information ($day, $month, $year, $hhour, $min, $sec) = split(' ', $date_string +); # Convert month from text to number if ($month == 'Jan') {$month = 0} elsif ($month == 'Feb') {$month = 1} elsif ($month == 'Mar') {$month = 2} elsif ($month == 'Apr') {$month = 3} elsif ($month == 'May') {$month = 4} elsif ($month == 'Jun') {$month = 5} elsif ($month == 'Jul') {$month = 6} elsif ($month == 'Aug') {$month = 7} elsif ($month == 'Sep') {$month = 8} elsif ($month == 'Oct') {$month = 9} elsif ($month == 'Nov') {$month = 10} else {$month = 11} # Calculate time on the log line in seconds $log_time = timelocal($sec,$min,$hhour,$day,$month,$year); if ($log_time >= $target) { push @refers, $referrer; } }

Some data:

216.45.43.42 - - [12/Dec/2002:18:39:15 -0500] "GET /news/opinions/varv +el.gif HTTP/1.1" 302 313 "http://www.freerepublic.com/forum/a3a95ca3c +24a0.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CL +R 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:15 -0500] "GET /images/header_aod2 +_15.gif HTTP/1.1" 200 4162 "http://www.indystar.com/print/articles/1/ +007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; + Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:15 -0500] "GET /images/storysearch +2.gif HTTP/1.1" 200 142 "http://www.indystar.com/print/articles/1/007 +735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Wi +n 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:15 -0500] "GET /users/ads/misc/rem +ax_searchad3.gif HTTP/1.1" 200 2335 "http://www.indystar.com/print/ar +ticles/1/007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Wi +ndows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.37 +05)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/sports_03_a +od.gif HTTP/1.1" 200 3195 "http://www.indystar.com/print/articles/1/0 +07735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; +Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/email.gif H +TTP/1.1" 200 138 "http://www.indystar.com/print/articles/1/007735-767 +1-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4. +90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/print.gif H +TTP/1.1" 200 139 "http://www.indystar.com/print/articles/1/007735-767 +1-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4. +90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/sidelinksen +d2.gif HTTP/1.1" 200 1009 "http://www.indystar.com/print/articles/1/0 +07735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; +Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/pics2/image +-007735-7410.jpg HTTP/1.1" 200 18319 "http://www.indystar.com/print/a +rticles/1/007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; W +indows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3 +705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/advertiseme +nt_250strip.gif HTTP/1.1" 200 238 "http://www.indystar.com/print/arti +cles/1/007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Wind +ows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705 +)" 12.222.75.65 - - [12/Dec/2002:18:39:17 -0500] "GET /users/ads/story/ma +cselect/macselect_250_Oct.gif HTTP/1.1" 200 10436 "http://www.indysta +r.com/print/articles/1/007735-7671-036.html" "Mozilla/4.0 (compatible +; MSIE 6.0; Windows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; . +NET CLR 1.0.3705)"

update (broquaint): changed <pre> tags to <code> tags

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://219469]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-09-07 13:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.