Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

RE: Benevolent Ad Filter

by merlyn (Sage)
on Jun 08, 2000 at 05:03 UTC ( [id://17001]=note: print w/replies, xml ) Need Help??


in reply to Benevolent Ad Filter

There's a mod_perl version of this in the ModPerl book.

-- Randal L. Schwartz, Perl hacker

Replies are listed 'Best First'.
RE: RE: Benevolent Ad Filter
by httptech (Chaplain) on Jun 08, 2000 at 06:18 UTC
    Interesting. But it looks like there are some important differences:

    The mod_perl version proxies everything, not just ad servers. However it only blocks images; sometimes ads come in the form of javascript or even java. But they usually get sent from the same server for tracking purposes, so my script will block all ad content from a given server. (You could probably alter the mod_perl version to do this though)

    The mod_perl version actually retrieves the entire file it blocks, which I think is a waste of bandwidth, but you're forced into that if you use LWP (as far as I know). That's why I use the Socket module, and close the connection as soon as I have the headers. The trade-off for this is my version will not work through another proxy server.

      You can, in fact, use LWP to load just the first part of the GET request, by using a content-callback handler that throws an exception, cutting off any further action. Quoting from perldoc LWP::UserAgent:
      The request can be aborted by calling die() in the call- back routine. The die message will be available as the "X-Died" special response header field.

      -- Randal L. Schwartz, Perl hacker

        That would be great. I really would like to use LWP if possible. I liked the fact that it would handle redirects for me. However I don't know if the built-in redirect function would work if I call die() during the callback.
      and close the connection as soon as I have the headers

      #!/usr/bin/perl use LWP::Simple; if (head('http://www.foo.com/')) { print "Page exists and would download fine!\n"; }
      In list context, head returns all kinds of interesting values such as response code, last modified, content-length etc.
      The reason for not using HEAD is simple; it shows up in the logs as a HEAD request and not a GET. If I were a advertiser I would be not count HEAD requests as legitimate page views, since its clear that the ad was never actually viewed.
        If I were a advertiser I would be not count HEAD requests as legitimate page views, since its clear that the ad was never actually viewed.

        Good point, I stand corrected!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://17001]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-07-21 06:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.