|
|
| There's more than one way to do things | |
| PerlMonks |
Re: Extracting paragraphs from htmlby fraktalisman (Hermit) |
| on Sep 11, 2005 at 12:55 UTC ( [id://491073]=note: print w/replies, xml ) | Need Help?? |
This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.
If you can't rely on certain tags (and I agree that you can't), the question is, what is the definition of a paragraph? Where does it stop? Certainly not at a newline, for we are dealing with HTML, and there might be many newlines in the source code where they are not visible in the page that is actually displayed.
And for a pragmatic approach, you might want to specify a maximum length at which the given text is truncated. There are people who don't use paragraphs at all, they just type or copy hundreds and thousands of words on a page, like they were writing a novel or like they haven't understood the necessity of formatting at all.
fraktalisman keeps rolling
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||