Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

anti leech CGI

by hiseldl (Priest)
on Oct 30, 2002 at 04:41 UTC ( #208968=snippet: print w/replies, xml ) Need Help??

I needed a quick script to provide 'Anti-Leeching' for some of my files. (Leeching is a method of stealing someone else's bandwidth by linking to their files instead of downloading the files to your own site.)

After several iterations, I finally have something that works pretty well for my .zip files.

Just put the hostname and the IP in the @HOSTS array for all your hosts that will use this script. As merlyn mentioned in one of his articles, everyone has a drawer where they keep miscellaneous items, this one belongs in your virtual drawer for that time when you need a quick anti-leech sub.

What time is it? It's Camel Time!

Update: changed grep pattern because of merlyn's comment about reverse DNS. Thanks merlyn!

#!/usr/bin/perl -w

use strict;
use CGI qw/:standard/;

my $DIR  = param('d') || './';       # default dir
my $FILE = param('f') || '';  # default file

my @HOSTS = (

print header('text/plain'), "No access to $FILE" 
  unless DownloadFile($DIR, $FILE, \@HOSTS);

exit 0;

sub DownloadFile 
    my ($dir, $filename, $hosts) = @_;

    my $remote = remote_host();

    #### the following is bad because it is not an exact
    #### match nor is it anchored
    #### return(0) unless grep /$remote/, @$hosts;

    # the following suggested by [merlyn]
    return 0 unless grep $remote eq $_, @$hosts;

    my $filesize = -s "$dir/$filename";

    # print full header
    print "Content-disposition: filename=$filename\n";
    print "Content-Length: $filesize\n";
    print "Content-Type: application/octet-stream\n\n";

    # open in binmode
    open(READ,$filename) || die;
    binmode READ;

    # stream it out
    binmode STDOUT;
    while (<READ>) { print;    }

    # should always return true
Replies are listed 'Best First'.
•Re: anti leech CGI
by merlyn (Sage) on Oct 30, 2002 at 07:06 UTC
    my $remote = remote_host(); return(0) unless grep /$remote/, @$hosts;
    So, I just set my reverse DNS to return "." for my host, and I'm in. Some security.

    Better yet, I set it to return "((.)*)*FOO", and it goes into a deep deep loop, burning CPU for some long-ish time.

    What problem exactly were you trying to solve again? If you want host-based authentication, use it. Don't try to reinvent it. Use your htaccess file to control the access to your URL.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      While true that reinventing the wheel is a bad idea, wouldn't this work ok if it were just IP based?:
      my $remote = $ENV{REMOTE_ADDR}; return(0) unless grep /$remote/, @$hosts;
      BTW, wouldn't taint have caught this? He's trusting user supplied data (DNS name) in an unsafe way.

      Update: See merlyn's reply on how to do this right.

      -- Dan

        my $remote = $ENV{REMOTE_ADDR}; return(0) unless grep /$remote/, @$hosts;
        No, because the point is that you're using a regex where you want an exact match, and it's not anchored either!

        This is better:

        my $remote = $ENV{REMOTE_ADDR}; return 0 unless grep $remote eq $_, @$hosts;
        wouldn't taint have caught this? He's trusting user supplied data (DNS name) in an unsafe way.
        No, because simply doing a regex match isn't considered "external" enough for tainted data to abort it.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

      Nice! I would like to see what your comments are about 90% of the other 'anti-leech' scripts out there. None of the scripts that I have seen have any taint checking nor 'use strict.'

        What problem exactly were you trying to solve again?

      The problem I want to solve is to stop other sites from using my bandwidth; they can link to my images, or zip files from their site, hence using my bandwidth for downloading files. This is not an 'authorization' issue. I also cannot use an Apache module since I do not have root access because my ISP is rather strict about who gets root access.

      These two lines form the 'gatekeeper' aspect of the sub:

      my $remote = remote_host(); return(0) unless grep /$remote/, @$hosts;
      ...reverse DNS lookup, I hadn't thought of that. I think what you're getting at is that I have to 'untaint' $remote.
      my $remote = remote_host(); # a domain or an ip can be letters or numbers seperated # with '.' and there must be at least one char followed # by a '.' with at least one char following $remote =~ /([A-Za-z0-9\.]*[A-Za-z0-9]+\.[A-Za-z0-9]+)/; return (0) unless length($remote) > 3; return (0) unless grep /$remote/, @$hosts;
      ...I haven't tested this yet, but am I on the right track?

      What time is it? It's Camel Time!

        A few points:
        • You didn't say you were in an environment where you don't really have a webservice. I'd consider any provider that doesn't let you control access lists to be a very crippled one. Find another. There are hundreds, at all price ranges.
        • I haven't tested this yet, but am I on the right track?
          I've already answered a better solution for the remote match in another node in this thread. The real point is that you don't need a regex, so stop using it!
        • $remote =~ /([A-Za-z0-9\.]*[A-Za-z0-9]+\.[A-Za-z0-9]+)/;
          This is not a good regex to match a hostname. You left out the hyphens, for example. At least you didn't use \w like so many do, incorrectly adding underscore to the mix. You get points for that. {grin}
        • You really don't want to send the file yourself. What you need is to just do an internal redirect to a URL that you keep secret, as in:
          my @GOODLIST = qw(; use CGI qw(:all); use strict; for my $remote_addr (remote_addr()) { if (grep $remote_addr, @GOODLIST) { print redirect ("/secret/URL/here/foo.thingy"); } else { print header ( status => 404 ), start_html( 'error' ), "The resource you tried to access is not found", end_html; } }
          There. That's your whole program. Short and sweet.
        • Of course, you can bypass all this nonsense, and simply give out the secret URL to your friends, and change it from time to time. The URL acts as a password. Be sure you don't link to it anywhere, and the directory that it's in must have indexing turned off, or an index.html to keep people from guessing. I've got a directory like that at that I use for semi-private publishing, such as when I'm publishing one of my columns for review here.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Re: (Funny) anti leech CGI
by shotgunefx (Parson) on Oct 30, 2002 at 10:10 UTC
    I remember some time back that a site was linking to a Yahoo! Store's images of Rolexes. The site owner contacted Y! to complain. (Was showing up in the referers)

    The images are named dynamically after each "publish" so an engineer manually installed images of Mickey Mouse watches and obscene things. The leech didn't notice for a couple of days. :)


    "To be civilized is to deny one's nature."
Re: anti leech CGI
by erasei (Pilgrim) on Oct 30, 2002 at 14:21 UTC
    If you are just trying to keep people from linking directly to your images, then there is a great module for Apache called AccessRefer. This uses the referrer to validate where the image is being loaded from. This has the added bonus of being able to be configured for your virtual hosts, thus giving each of your sites their own "access list".

    As with anything coming from the user, REFER can be forged, but for the most part, this is a pretty good way of doing it.

      I would love to use an Apache module, but my ISP won't let me have root access. :(

      What time is it? It's Camel Time!

Re: anti leech CGI
by hacker (Priest) on Oct 30, 2002 at 14:54 UTC
    What color would you like that wheel you're about to reinvent? chrome? wood? rubber?

    There's no need to push this down into the perl side of things, when Apache has facilities built-in that can handle this at the request level. Look into mod_rewrite, and more specifically, the URL Rewriting Guide, specifically the "Blocked Inline-Images" section about 80% of the way down. I use this method on several of my sites, and it works wonderfully (it's not just for images).

    To summarize:

    <Directory /path/to/> AllowOverride None RewriteEngine On RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^*$ [NC] RewriteRule .*\.(png|gz|zip|)$ [R] </Directory>

      What if your wheel doesn't fit? See my reply to erasei.

      Thanks for the summary of the Apache config though.

      What time is it? It's Camel Time!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://208968]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2022-06-29 05:53 GMT
Find Nodes?
    Voting Booth?
    My most frequent journeys are powered by:

    Results (94 votes). Check out past polls.