Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Online Comics

by Boots111 (Hermit)
on Apr 01, 2002 at 00:56 UTC ( #155680=CUFP: print w/ replies, xml ) Need Help??

Monks~

Well, I don't know about you all, but I am addicted to a couple of online comics (six at current but more are often added). Back when it was only two or three, I did not mind going from website to website everyday to read them. Sadly now that has become too much effort...

Thus I have sent Perl to the rescue!

Right now, all my script does is use LWP::simple to get each of the chosen webpages, parse through the html for known anchors, and then getstore() the appropriate images. It then generates a simple webpage that has all of my comics one after another. It is written in a fairly generic format, so adding a new website only takes about 10 lines of code... Right now, it is order so that the comics which update more frequently are higher on the page.

I would of course be interested in comments/suggestions
#!c:\perl\perl.exe -w use strict; use LWP::Simple; my @results; my @file; my $i = 0; my @sluggy = split("\n",get("http://www.sluggy.com")); for (@sluggy) { if(/"(http:\/\/pics\.sluggy\.com\/comics\/.*?)"/) { push @results, $1; $1 =~ /.*\.(...)/; push @file, "image_$i.$1"; $i++; } } my @userfriendly = split("\n",get("http://www.userfriendly.org")); for (@userfriendly) { if(/"(http:\/\/www\.userfriendly\.org\/cartoons\/archives\/[\d|\l] +.*?)"/) { push @results, $1; $1 =~ /.*\.(...)/; push @file, "image_$i.$1"; $i++; } } my @sinfest = split("\n",get("http://www.sinfest.net")); for (@sinfest) { if(/"(\/comics\/.*?)"/) { push @results, "http://www.sinfest.net$1"; $1 =~ /.*\.(...)/; push @file, "image_$i.$1"; $i++; } } get("http://www.nuklearpower.com/comic/") =~ /<a href="(.*?)">Newest Comic<\/a>/; my @nuklearpower = split("\n",get("http://www.nuklearpower.com/comic/$ +1")); for (@nuklearpower) { if( /<p align=".*?"><img border=".*?" src="(.*?)" width="720" height="936" +><\/p>/ ) { push @results, "http://www.nuklearpower.com/comic/$1"; $1 =~ /.*\.(...)/; push @file, "image_$i.$1"; $i++; } } my @megatokyo = split("\n",get("http://www.megatokyo.com")); for (@megatokyo) { if(/"(\/strips\/.*?)"/) { push @results, "http://www.megatokyo.com$1"; $1 =~ /.*\.(...)/; push @file, "image_$i.$1"; $i++; } } my @machall = split("\n",get("http://www.machall.com")); for (@machall) { if( /src='(\/index\.php\?do_command=show_strip.*?)'/ ) { push @results, "http://www.machall.com$1"; push @file, "image_$i.jpg"; $i++; } } open PAGE, ">comics.html" || die $!; print PAGE "<html>\n<main>\n<title>Matt's Comic Page</title>"; for(my $i = 0; $i < @results; $i++) { print "$results[$i]\n"; $results[$i] =~ /.*\.(...)/; getstore($results[$i], $file[$i]); print PAGE "<img src=\"$file[$i]\"><br>\n"; } print PAGE "</main>\n</html>"; close PAGE || die $!;

Hope you all enjoy,
Matt
----
Computer science is merely the post-Turing decline of formal systems theory.
-???

Comment on Online Comics
Download Code
Re: Online Comics
by belg4mit (Prior) on Apr 01, 2002 at 01:28 UTC
    You might want to look at netcomics

    --
    perl -pe "s/\b;([st])/'\1/mg"

      If you are a bit of a gamer or just plain like gaming humor you might want to check out PvP and also Penny Arcade
        Umm you may have meant this for the head of the thread, I already read Penny Arcade. Though that is rather OT and probably belongs best as a private /msg.

        --
        perl -pe "s/\b;([st])/'\1/mg"

Re: Online Comics
by gav^ (Curate) on Apr 01, 2002 at 02:36 UTC
    Ignoring the ethics of downloading the images from comics that rely on advertising to stay in business (wouldn't it be better to mirror the whole page?), here are a few pointers on your code.
    • You might want to test to see if get got anything. If it returns undef due to failure your script will break. For this reason it's usually best not to use LWP::Simple
    • If you use m// with one of the alternative delimiters (@, #, | etc) you can can save yourself escaping backslashes and having code that looks like line noise
    • On the same note, look into q// and qq// for quoting
    • /.*\.(...)/ is silly. It would make more sense to use something like HTML::LinkExtor to go through the links and find one with the filename/image dimensions you want. Parsing HTML with regex's is generally not the right way to go
    • You want either open PAGE, ">comics.html" or die $! or open(PAGE, ">comics.html") || die $!
    • Perl has a foreach loops which saves you from using error prone C style for loops.
    Hope this helps..

    gav^

      gav^~

      When you initially posted this (I know it was a long time ago), I had felt a twinge of guilt, because the ethical issue had occured to me earlier. However, at the time, I wanted a fast way to check all of the comics I read without typing each URL or going to each bookmark one at a time.

      A little while later I discovered, Mozilla and tabbed browsing. Since you can bookmark mutliple tabs into a single spot, I now use that instead of the script.

      It was not the advertisements I wanted to avoid but the hassle of going through so many clicks to read all of the comics. Now I can have both, and I feel much better about it.

      Thanks for both your comments about the code and otherwise,
      Boots
      ---
      Computer science is merely the post-Turing decline of formal systems theory.
      --???
      hey! what about raytoons? www.raytoons.cjb.net
Re: Online Comics (boo)
by boo_radley (Parson) on Apr 01, 2002 at 16:33 UTC
    Ah, the Online Comic Grabbing Script...
    Truly one of the strange but timeless rites of passage for webby perl programmers.
    If there's ever a Standard Perl Competency Test, this should be one of the tasks...

Re: Online Comics
by talexb (Canon) on Apr 01, 2002 at 20:15 UTC
    I seem to remember merlyn showing off a script that did the same thing for Dilbert at YAPC 19100 at CMU.

    Once you start to copy and paste code multiple times, you have to ask yourself, "Gee, I wonder if I could make a loop out of this."

    I'd also suggest using a READMORE tag for the code portions of future posts.

    --t. alex

    "Here's the chocolates, and here's the flowers. Now how 'bout it, widder hen, will ya marry me?" --Foghorn Leghorn

      Sorry about that, I forgot to capitalize my READMORE tag...

      Matt
      ----
      Computer science is merely the post-Turing decline of formal systems theory.
      --???
Re: Online Comics
by PrakashK (Pilgrim) on Apr 02, 2002 at 16:03 UTC
    For a highly configurable comics download tool (of course, written in Perl), go to dailystrips.
    It currently supports over 300 comics and offers a 'local' mode in which strips are downloaded and saved locally to speed access time.
    If you are a debian user, it is just an apt-get away from you.

    /prakash

Re: Online Comics
by carthag (Scribe) on Apr 04, 2002 at 15:14 UTC
    Not to toot my own horn, but here's another way. The full source is available via the link.

    Oh, and good work Matt! =)
Re: Online Comics code review
by petdance (Parson) on Apr 06, 2002 at 22:01 UTC
    • Please get yourself a copy of The Pragmatic Programmer, and read and reread the section on DRY:
      Don't Repeat Yourself.
      The fetching part of the script should be rolled into a function that handles a given strip, and then called for each of the strips. What you're doing there is a maintenance nightmare.
    • How is one to know what $i means? It's not in a loop, so I have no clue unless I look at it throughout the code.
    • Your file opening test is meaningless because it can never succeed. || is not the same as or. What you're calling there as
      open PAGE, ">comics.html" || die $!;
      is effectively
      open PAGE, (">comics.html" || die $!);
      The part in parens will ALWAYS evaluate to ">comics.html". What you want is:
      open PAGE, ">comics.html" or die $!;
      which is effectively
      (open PAGE, ">comics.html") or die $!;
    • If you're using Perl 5.6.0+, avoid the bareword style filehandles, and instead use
      open my $fh, ">", "comics.html" or die $!;
    The rest of the comments posted are good ones, too.

    xoxo,
    Andy
    --
    <megaphone> Throw down the gun and tiara and come out of the float! </megaphone>

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://155680]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-08-22 05:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (147 votes), past polls