in reply to Extracting paragraphs from html

Use XML::LibXML in HTML-parsing mode, then use an XPath that looks for text() nodes that have a length greater than N.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

update: See Locate large HTML paragraphs with XML::LibXML.