Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Keep It Simple, Stupid
 
PerlMonks  

RE: Benevolent Ad Filter

by merlyn (Sage)
on Jun 08, 2000 at 01:03 UTC ( [id://17001]=note: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.


in reply to Benevolent Ad Filter

There's a mod_perl version of this in the ModPerl book.

-- Randal L. Schwartz, Perl hacker

Replies are listed 'Best First'.
RE: RE: Benevolent Ad Filter
by httptech (Chaplain) on Jun 08, 2000 at 02:18 UTC
    Interesting. But it looks like there are some important differences:

    The mod_perl version proxies everything, not just ad servers. However it only blocks images; sometimes ads come in the form of javascript or even java. But they usually get sent from the same server for tracking purposes, so my script will block all ad content from a given server. (You could probably alter the mod_perl version to do this though)

    The mod_perl version actually retrieves the entire file it blocks, which I think is a waste of bandwidth, but you're forced into that if you use LWP (as far as I know). That's why I use the Socket module, and close the connection as soon as I have the headers. The trade-off for this is my version will not work through another proxy server.

      and close the connection as soon as I have the headers

      #!/usr/bin/perl use LWP::Simple; if (head('http://www.foo.com/')) { print "Page exists and would download fine!\n"; }
      In list context, head returns all kinds of interesting values such as response code, last modified, content-length etc.
      The reason for not using HEAD is simple; it shows up in the logs as a HEAD request and not a GET. If I were a advertiser I would be not count HEAD requests as legitimate page views, since its clear that the ad was never actually viewed.
        If I were a advertiser I would be not count HEAD requests as legitimate page views, since its clear that the ad was never actually viewed.

        Good point, I stand corrected!

      You can, in fact, use LWP to load just the first part of the GET request, by using a content-callback handler that throws an exception, cutting off any further action. Quoting from perldoc LWP::UserAgent:
      The request can be aborted by calling die() in the call- back routine. The die message will be available as the "X-Died" special response header field.

      -- Randal L. Schwartz, Perl hacker

        That would be great. I really would like to use LWP if possible. I liked the fact that it would handle redirects for me. However I don't know if the built-in redirect function would work if I call die() during the callback.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://17001]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.