Samn has asked for the wisdom of the Perl Monks concerning the following question:

I'm to develop a "feeds" program for a porn website.

I am hosting a number of video files on my servers. I allow other webmasters to link to my videos from their website, and I charge them by total bandwidth consumed.

You run a website called "XXX Free Perl", and you have signed up on my site to link to videos on my server including one file called "two_monks_one_girl_(HOT!).mpg" . At the end of the month, I need to know how many of your viewers watched my video, and how much bandwidth they used. I would also need to know how much bandwidth another webmaster consumed by linking to the same video.

And finally, I need to know how much total bandwidth was used transferring any specific video of mine so I can kindly pay the producer of the video his royalties. This is probably easier.

Before its pointed out to me, I'll say that the other half of this program would be properly securing who has access to what files and when. I know.

The problem is that I just have no idea where to start. We're running UNIX boxen, so I could use PHP for this if that would work better. The server is probably Apache, but this product is critical and we would switch servers if necessary.

Do any Perl modules accomplish anything similar to this? Can Apache gather this info for me? Thanks ahead for any direction you can offer.

Re: Tracking bandwidth
by cees (Curate) on Feb 05, 2004 at 06:04 UTC

    This is probably best done through apache with a combination of the mod_log_config module and mod_logio module that is available in Apache 2.0.x.

    You haven't mentioned how you will identify your clients, but it will probably be done with a REFERER header (easily spoofed by end users) or a cookie or query parameter. Either way, you can use the above modules to generate a custom log to track actual bandwidth used.

    Something like the following:

    LogFormat "%V %a %s %I %O %{Referer}i %q" trackbandwidth CustomLog /var/log/bandwidth trackbandwidth

    You will have to choose what you save in the log file, whether it is the cookie headers, or the query string (this will depend on how you plan to identify your clients). The important entries above are the %I and %O which contain the actual number of bytes sent and received for this request (this includes the headers and the content of the request). I am pretty sure that this will take into consideration a user stopping the request part way through the download as well, but I would suggest you test that before taking my word for it.

    Then just write a perl script to parse this log file and generate a full report of the bandwidth used per client.

      Stopped requests are a real concern because of the way download accelerators like Go!Zilla, FlashGet, and GetRight often work. The more uncouth ones will request a file multiple times at the same time at different Range offsets and then drop connections once they get an overlap - so make sure the %O in mod_logio really works. What a great Apache module! I wish had been available back in 1999-2000, when I was doing this sort of thing. Actually, I was doing the reverse - trying to regulate a free geocities-type site (and prevent residents from using more than their fair share of bandwidth). I ended up counting each partial request at the full size of the file, primarily punishing the warez and pr0n crowd (who tend to use download accelerators the most), which was fine with me.
      Thank you very much, this solves everything.
Re: Tracking bandwidth
by scottj (Monk) on Feb 05, 2004 at 04:30 UTC
    I use AWStats for tracking all kinds of usage, including bandwidth. I highly recommend it. And to keep this on topic, it's written in perl. :)
Re: Tracking bandwidth
by exussum0 (Vicar) on Feb 05, 2004 at 04:14 UTC
    You may want to look at analog. It's an apache log tool thing that analizes and gives you nice stats on what happened.

Re: Tracking bandwidth
by b10m (Vicar) on Feb 05, 2004 at 04:24 UTC

    To keep check of the bandwidth, you might want to look at something like IPTraf. This piece of software was used on a campus I know of, to keep track of the bandwidth usage of the students, and worked reasonably well, IIRC.



