in reply to Extracting paragraphs from html

Use XML::LibXML in HTML-parsing mode, then use an XPath that looks for text() nodes that have a length greater than N.

