Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
laziness, impatience, and hubris
 
PerlMonks  

web scraper testing

by alienhuman (Pilgrim)
on Apr 11, 2006 at 20:52 UTC ( #542671=perlquestion: print w/ replies, xml ) Need Help??
alienhuman has asked for the wisdom of the Perl Monks concerning the following question:

Howdy Monks,

I've got a web scraping script. It works fine, except it's a PITA to test the program logic, because the conditions under which it scrapes an external web site only happen for about an hour a day (the information being scraped is time sensitive).

In order for me to test the script's logic outside of that one hour a day, I have to fake:

  • successful login to the site
  • successful scrape (with usable data)
  • query to my DB (assembled programatically based on scrape)
  • successful POST to site

I currently accomplish this by setting a "TEST" flag in my code, and at certain junctures testing for it and running different code if I'm testing. Then there's also some bits of code that I just comment/uncomment during tests. I'd like to rewrite the package that contains my obj/methods so that when the object is created as a "test", the usual methods will not scrape the external site, query the DB, etc normally. Instead they'll execute under testing conditions, so that I can test program logic during other times of the day.

Any thoughts on how, generally, to think about writing code to handle this kind of thing?

Thanks in advance,

AH

----------
Using perl 5.8.6 unless otherwise noted. Apache/2.0.54 unless otherwise noted. Fedora Core 4 (2.6.11-1.1369_FC4) unless otherwise noted.

Comment on web scraper testing
Download Code
Re: web scraper testing
by roboticus (Canon) on Apr 12, 2006 at 00:10 UTC
    I'd suggest that you break down the task into a couple of different functions: One would do the login junk and grab the HTML blob. Another would parse the HTML blob and return a field list or SQL statement string or some such.

    Armed thusly, you can then write a simple test module that calls your HTML blob handler with different HTML blobs and verifies that the correct junk is returned. You can also write a simple test fixture using the first chunk to simply grab a set of screens and write their HTML out to a test file (suitable for use with your first test fixture!).

    Divide et impera!

    --roboticus

Re: web scraper testing
by eXile (Priest) on Apr 12, 2006 at 03:27 UTC
    you could create some webpages somewhere that mimick various situations you want to scrape and have these be your test-cases.
Re: web scraper testing
by planetscape (Canon) on Apr 13, 2006 at 04:37 UTC

      Thanks, just what I was looking for.

      AH

      ----------
      Using perl 5.8.6 unless otherwise noted. Apache/2.0.54 unless otherwise noted. Fedora Core 4 (2.6.11-1.1369_FC4) unless otherwise noted.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://542671]
Approved by kvale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2014-04-20 14:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls