in reply to Matching regular expression over multiple lines

Don't parse html with regexes. Seriously,

DO NOT PARSE HTML WITH REGEXES

Better men than us have tried that and failed. Use the right tool for the job:
use HTML::TagParser; my $html = qq[ <blockquote> <p><b>Joos van Cleve</b> - Lucretia (detail)</p> </blockquote> <p>beautiful</p> </blockquote> <p>indeed I am</p> <footer> ]; my $parser = HTML::TagParser->new( $html ); print $parser->getElementsByTagName( "footer" )->previousSibling->inne +rText;


holli

You can lead your users to water, but alas, you cannot drown them.

Replies are listed 'Best First'.
Re^2: Matching regular expression over multiple lines
by LanX (Bishop) on Oct 15, 2017 at 11:52 UTC
    It depends, I agree with arbitrary HTML.

    But sometimes with simple output generated automatically - like pdftohtml - regex is the right tool.

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re^2: Matching regular expression over multiple lines
by Maire (Scribe) on Oct 16, 2017 at 05:53 UTC
    Thank you for the code and the tip. I didn't know that there was an alternative to using a regex, so this was incredibly helpful, thanks!
Re^2: Matching regular expression over multiple lines
by Anonymous Monk on Oct 15, 2017 at 23:41 UTC

    Hi

    What you posted is the equivalent of using regex

    DO NOT PARSE HTML With low level parsers like HTML::TagParser

    Use a "DOM" like HTML::Tree / XML::Twig / XML::LibXML / Mojo::DOM...

      > with low level parsers like ...

      Please explain.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!