Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: capturing between divs

by monarch (Priest)
on Apr 13, 2009 at 03:58 UTC ( #757149=note: print w/ replies, xml ) Need Help??


in reply to Re: capturing between divs
in thread capturing between divs

"If you're not using an HTML parser, you will have much more brittle code."

..of course if you don't understand how the parser works then you're no better off than if you used a simple regexp in the first place.

Try this for size:

while ( $html =~ m{<div[^>]*>(.*?)</div>}sgi ) { my $inside_div = $1; # process contents of $inside_div ... }

This regexp simply looks for content between div tags. It does not support nested divs.. but if you want complex parsing you're better off using a complex parser.

The regexp has the s (multi-line), g (global), and i (case-insensitive) flags set.


Comment on Re^2: capturing between divs
Select or Download Code
Re^3: capturing between divs
by Your Mother (Canon) on Apr 13, 2009 at 06:18 UTC
    but if you want complex parsing you're better off using a complex parser.

    Well... no. If you want correct parsing, you should use a parser. A regex like that can indeed work but it has many edge cases where it will fail and I am personally sick of inheriting code that fails when there are numerous, well-known, deeply tested and vetted ways to solve the problem correctly.

    For quick one-offs or if you know your input intimately a regex on HTML can be okay but for production code it is just Wrong™. Also, I realize you might understand this and just chose poor terms but s means . matches newlines (the single line modifier), m is the multi-line modifier; perlre.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://757149]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-12-26 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls