Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Comparison of HTML documents with Perl

by talexb (Canon)
on Feb 24, 2005 at 21:29 UTC ( #434266=note: print w/replies, xml ) Need Help??

in reply to Comparison of HTML documents with Perl

I've had good luck with HTML::Parser, but I think what you're asking for could end up being infinitely complicated.

You want to end up doing a high level compare, not a line by line or word by word compare. If you have to handle anyone's HTML, that could be impossible. If you're trying to version your own HTML, that will probably be easier -- you can focus on just a few tags.

I think I'd start by comparing the structure of the document for changes, and see where a paragraph has been added or a table row has been deleted. From there, I'd look at the text within the parts whose structure has not changed. Boy, that's a really interesting project. Let us know how it turns out.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

  • Comment on Re: Comparison of HTML documents with Perl

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://434266]
[1nickt]: See this code. (I expected to simply eval loading threads as a check, but weirdness happened with Perlbrew so it's a grep of -V ...)
[choroba]: Config might be better than grepping -V
[Corion]: Also see Config::V, which is less of that hackery, or that hackery hidden in a module ;)
[1nickt]: The problem was with Perlbrew
[Corion]: Whoops - Config::Perl::V
[1nickt]: I found that when using Perlbrew as recommended, with cpanminus in the system perl lib, such tests were failing to detect the data about the perl that was the install destination.

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2017-10-18 15:39 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (249 votes). Check out past polls.