xjar has asked for the wisdom of the Perl Monks concerning the following question:
Hello all. I need to write a program to compare two HTML documents to determine if they are similar enough to be considered "the same". What I was thinking of doing is this (keep in mind, I'm a neophyte, so if my ideas are pretty poor, be kind):Read each document into an array, line by line
Strip the newline off of each array element
"Concatenate" each array element into a string variable, so that in the end, each variable will hold an entire document
Take a substr() of each variable, say 150 characters in, and then take 100 characters from there. If the two are the same, then the documents are the same.
Now, I'm not sure how efficient this will be, especially with the swapping from array to variable. Can anyone provide me with some ideas, or even (hehe) a module that can help with this?
Much thanks, xjar
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: HTML Document Comparison
by merlyn (Sage) on Sep 13, 2000 at 19:49 UTC | |
by mdillon (Priest) on Sep 13, 2000 at 20:25 UTC | |
by merlyn (Sage) on Sep 13, 2000 at 20:26 UTC | |
by mdillon (Priest) on Sep 13, 2000 at 20:49 UTC | |
by little (Curate) on Sep 13, 2000 at 20:36 UTC | |
Re: HTML Document Comparison
by moen (Hermit) on Sep 14, 2000 at 01:37 UTC | |
by merlyn (Sage) on Sep 14, 2000 at 01:39 UTC | |
Re: HTML Document Comparison
by xjar (Pilgrim) on Sep 13, 2000 at 22:00 UTC | |
Re: HTML Document Comparison
by cbraga (Pilgrim) on Sep 14, 2000 at 01:25 UTC | |
by extremely (Priest) on Sep 14, 2000 at 03:39 UTC | |
Re: HTML Document Comparison
by planetscape (Chancellor) on Mar 22, 2008 at 21:07 UTC | |
Re: HTML Document Comparison
by ww (Archbishop) on Mar 22, 2008 at 21:47 UTC |
Back to
Seekers of Perl Wisdom