Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Re: Re: Pulling by regex II

by PhiRatE (Monk)
on Dec 14, 2002 at 23:40 UTC ( #219914=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Pulling by regex II
in thread Pulling by regex II

As a hunch, you may need to try switching off taint (the -t switch). Date::Manip may not be handling taint correctly, other than that, no reason that I can think of.

Replies are listed 'Best First'.
Re: Re: Re: Re: Pulling by regex II
by mkent (Acolyte) on Dec 15, 2002 at 20:54 UTC
    Thank you. Turns out on my system, taint is a capital T. I also added the extra lines after header as suggested by DapperDan. Here's the resulting code plus some recent data, but all I get as output is "-: 1". Not sure why.

    #!/usr/local/bin/perl -slwT
    use strict;
    use warnings;
    use Date::Manip;
    use CGI qw/:standard/;
    # Make sure neither we, nor any of our submodules compromise security
    # by calling unpathed programs.
    $ENV{PATH} = "/bin:/usr/bin";
    # Use CGI to print our headers
    print header, "\n\n";
    my %referers = ();
    # Retrieve and security-check parameters
    my $hour = param('hour');
    my $minute = param('minute');
    if ($hour !~ /^\d\d?$/) { die('Invalid hour'); }
    if ($minute !~ /^\d\d?$/) { die('Invalid minute'); }
    # Get date object for our check point
    my $check_date = ParseDate("${hour}hours ${minute}minutes ago");
    # File handling, one line at a time
    open(FH,"datafile.html") || die('Could not open log file');
    while (my $line = <FH>) {
        next if ($line !~ /^\S+ \S \S \(\S+) \S+\ "^"+" \d+ \d+ "(^"+)"/);
        my $line_date = ParseDate($1);
        # Check to see if the line date is in the range we're after
        next unless Date_Cmp($line_date, $check_date)>0;
        # If the referer is new, we set to 1 entry, otherwise increment (incrementing undef doesn't work well)
        if (!$referers{$2}) {
        } else {
    my $row = 0;
    # Sort our referers by the number of hits
    for (sort {$referers{$b} <=> $referers{$a}} keys %referers) {
        # break out after the tenth one
        last if $row++==10;
        print "$_: ".$referers{$_}."\n";

    Recent data: - - 15/Dec/2002:14:52:13 -0500 "GET /images/69.gif HTTP/1.1" 200 1348 "" "Mozilla/4 .0 (compatible; MSIE 5.5; Windows 98)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/header_aod2_01.gif HTTP/1.0" 200 2011 " ml" "Mozilla/4.79 en (Windows NT 5.0; U)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/header_aod2_15.gif HTTP/1.0" 200 4162 " ml" "Mozilla/4.79 en (Windows NT 5.0; U)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/header_aod2_10.gif HTTP/1.0" 200 3034 " ml" "Mozilla/4.79 en (Windows NT 5.0; U)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/go_blue.gif HTTP/1 .0" 200 133 "" "Moz illa/4.79 en (Windows NT 5.0; U)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/aod_searchend2.gif HTTP/1.0" 200 186 " l" "Mozilla/4.79 en (Windows NT 5.0; U)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/coheader2_aod_08.gif HTTP/1.1" 304 - " " "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/coheader2_aod_10.gif HTTP/1.1" 304 - " " "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)" - - 15/Dec/2002:14:52:13 -0500 "GET /images/email.gif HTTP/1.0 " 200 138 "" "Mozil la/4.79 en (Windows NT 5.0; U)" - - 15/Dec/2002:14:52:14 -0500 "GET /forums/showthread.php?s=&po stid=177042 HTTP/1.1" 200 7302 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1 .0.1) Gecko/20021003" - - 15/Dec/2002:14:52:14 -0500 "GET /images/coheader2_aod_11.gif HTTP/1.1" 200 954 " 44" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Q312461)"

      No ideas on the problem, looks fine that I can see. Perhaps try putting some debug print statements in the while() loop and make sure everything is getting set where it should. It sounds like the regex is going wonky somewhere but I don't see how with that test data.
        Turned out that I was saving the data as ".html" and apparently that caused the script to read multi lines as single lines, thus the problem. Dumping the ".html" tag cleared it up. Your code is working great, as in my latest "Count sort and output II" posting. Thanks!!

        Sorry it took me a little while to reply, still feeling my way around this site!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://219914]
[Tanktalus]: I'm sure it'll make perfect sense once we figure it out :)
[Lady_Aleena]: marioroy, not a quoting error. I can't run it on the command line. It is when I put it in qx() where it falls apart.
Discipulus hired to use Perl in Eataly? haired maybe
[choroba]: We're hiring
[Discipulus]: wow praha!
[Tanktalus]: choroba: do I have to move? :)
Discipulus has too much roots in a 2770 yo town
[marioroy]: I want a job after completing MCE in about a week. But feel that I've moved to the wrong place.
[choroba]: I fear so. We have offices in Prague, Brno, Saigon and San Francisco.
[Discipulus]: choroba if let your company to hire Tanktalus and marioroy ... what a big fishing!

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (10)
As of 2017-04-23 20:08 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (432 votes). Check out past polls.