Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re: Harvesting and Parsing HTML from other sites

by marius (Hermit)
on Mar 28, 2001 at 09:31 UTC ( #67753=note: print w/replies, xml ) Need Help??

in reply to Harvesting and Parsing HTML from other sites

First, change your @pages array to a hash. Then you can step through this with a:
foreach $page (keys %pages) { }
rather than the cumbersome and obfuscated for(){} loop above.

Second, a lot of your regexes don't need the /s modifier. See perldoc perlre for info about that.

Third, use strict.

And now for code error issues: I don't see where you set $keeperlength before using it in your nested for(){} loop. Incidentally, your changing of <tag> to {{{tag}}} doesn't account for things like <br />. That's a minor nitpick, though. Other than that, I can't see why it would "revert" back to the original $html variable. Wanna fix these things I've pointed out (or point out my flaws in thinking as the case may be =]) and try it, and if it still doesn't work point us to some pages that do and pages that don't work and we'll continue hammering.

Good luck!


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://67753]
[Corion]: Maybe the Perl you try stand-alone and the Perl you run are different?

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2018-06-23 12:16 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.