Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Re: Re: Pulling by regex II

by PhiRatE (Monk)
on Dec 14, 2002 at 23:40 UTC ( #219914=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Pulling by regex II
in thread Pulling by regex II

As a hunch, you may need to try switching off taint (the -t switch). Date::Manip may not be handling taint correctly, other than that, no reason that I can think of.


Comment on Re: Re: Re: Pulling by regex II
Re: Re: Re: Re: Pulling by regex II
by mkent (Acolyte) on Dec 15, 2002 at 20:54 UTC
    Thank you. Turns out on my system, taint is a capital T. I also added the extra lines after header as suggested by DapperDan. Here's the resulting code plus some recent data, but all I get as output is "-: 1". Not sure why.

    #!/usr/local/bin/perl -slwT
    
    use strict;
    use warnings;
    use Date::Manip;
    use CGI qw/:standard/;
    
    # Make sure neither we, nor any of our submodules compromise security
    # by calling unpathed programs.
    $ENV{PATH} = "/bin:/usr/bin";
    $ENV{IFS}="";
    
    # Use CGI to print our headers
    print header, "\n\n";
    
    my %referers = ();
    
    # Retrieve and security-check parameters
    my $hour = param('hour');
    my $minute = param('minute');
    
    if ($hour !~ /^\d\d?$/) { die('Invalid hour'); }
    if ($minute !~ /^\d\d?$/) { die('Invalid minute'); }
    
    # Get date object for our check point
    my $check_date = ParseDate("${hour}hours ${minute}minutes ago");
    
    # File handling, one line at a time
    open(FH,"datafile.html") || die('Could not open log file');
    while (my $line = <FH>) {
    
        next if ($line !~ /^\S+ \S \S \(\S+) \S+\ "^"+" \d+ \d+ "(^"+)"/);
        
        my $line_date = ParseDate($1);
    
        # Check to see if the line date is in the range we're after
        next unless Date_Cmp($line_date, $check_date)>0;
    
        # If the referer is new, we set to 1 entry, otherwise increment (incrementing undef doesn't work well)
        if (!$referers{$2}) {
            $referers{$2}=1;
        } else {
            $referers{$2}++;
        }
    }
    close(FH);
    
    my $row = 0;
    
    # Sort our referers by the number of hits
    for (sort {$referers{$b} <=> $referers{$a}} keys %referers) {
        # break out after the tenth one
        last if $row++==10;
        print "$_: ".$referers{$_}."\n";
    }
    

    Recent data:

    68.22.179.211 - - 15/Dec/2002:14:52:13 -0500 "GET /images/69.gif HTTP/1.1" 200 1348 "http://www.indystar.com/print/articles/6/008596-6466-040.html" "Mozilla/4 .0 (compatible; MSIE 5.5; Windows 98)"
    141.154.123.193 - - 15/Dec/2002:14:52:13 -0500 "GET /images/header_aod2_01.gif HTTP/1.0" 200 2011 "http://www.indystar.com/print/articles/2/008227-9652-031.ht ml" "Mozilla/4.79 en (Windows NT 5.0; U)"
    141.154.123.193 - - 15/Dec/2002:14:52:13 -0500 "GET /images/header_aod2_15.gif HTTP/1.0" 200 4162 "http://www.indystar.com/print/articles/2/008227-9652-031.ht ml" "Mozilla/4.79 en (Windows NT 5.0; U)"
    141.154.123.193 - - 15/Dec/2002:14:52:13 -0500 "GET /images/header_aod2_10.gif HTTP/1.0" 200 3034 "http://www.indystar.com/print/articles/2/008227-9652-031.ht ml" "Mozilla/4.79 en (Windows NT 5.0; U)"
    141.154.123.193 - - 15/Dec/2002:14:52:13 -0500 "GET /images/go_blue.gif HTTP/1 .0" 200 133 "http://www.indystar.com/print/articles/2/008227-9652-031.html" "Moz illa/4.79 en (Windows NT 5.0; U)"
    141.154.123.193 - - 15/Dec/2002:14:52:13 -0500 "GET /images/aod_searchend2.gif HTTP/1.0" 200 186 "http://www.indystar.com/print/articles/2/008227-9652-031.htm l" "Mozilla/4.79 en (Windows NT 5.0; U)"
    24.79.125.220 - - 15/Dec/2002:14:52:13 -0500 "GET /images/coheader2_aod_08.gif HTTP/1.1" 304 - "http://www.indystar.com/forums/showthread.php?s=&postid=177044 " "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)"
    24.79.125.220 - - 15/Dec/2002:14:52:13 -0500 "GET /images/coheader2_aod_10.gif HTTP/1.1" 304 - "http://www.indystar.com/forums/showthread.php?s=&postid=177044 " "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)"
    141.154.123.193 - - 15/Dec/2002:14:52:13 -0500 "GET /images/email.gif HTTP/1.0 " 200 138 "http://www.indystar.com/print/articles/2/008227-9652-031.html" "Mozil la/4.79 en (Windows NT 5.0; U)"
    66.149.178.96 - - 15/Dec/2002:14:52:14 -0500 "GET /forums/showthread.php?s=&po stid=177042 HTTP/1.1" 200 7302 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1 .0.1) Gecko/20021003"
    24.79.125.220 - - 15/Dec/2002:14:52:14 -0500 "GET /images/coheader2_aod_11.gif HTTP/1.1" 200 954 "http://www.indystar.com/forums/showthread.php?s=&postid=1770 44" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)"

      No ideas on the problem, looks fine that I can see. Perhaps try putting some debug print statements in the while() loop and make sure everything is getting set where it should. It sounds like the regex is going wonky somewhere but I don't see how with that test data.
        Turned out that I was saving the data as ".html" and apparently that caused the script to read multi lines as single lines, thus the problem. Dumping the ".html" tag cleared it up. Your code is working great, as in my latest "Count sort and output II" posting. Thanks!!

        Sorry it took me a little while to reply, still feeling my way around this site!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://219914]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2014-08-02 08:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (55 votes), past polls