Re^3: Extracting HTML content between the h tags


Pathologically Eclectic Rubbish Lister
	PerlMonks

Re^3: Extracting HTML content between the h tags

by vagabonding electron (Curate)

on Aug 05, 2012 at 14:09 UTC ( [id://985529]=note: print w/replies, xml )

Need Help??

in reply to Re^2: Extracting HTML content between the h tags
in thread Extracting HTML content between the h tags

Thank you very much!
Just tried the both approaches, it works even if the last h2-tag is missing ( appears in about 10 pages from > 400, for which I used the following workaround:

my @solution_2 = $content->findvalues( './h2[4]/preceding-sibling::*' 
+);
unless ( @solution_2 )
{
   @solution_2 = $content->findvalues( '//hr/preceding-sibling::*' );
}
[download]

... with substr as before ...
Fortunately they have only one hr-tag in the page :-)
With your approach it is not necessary anymore.
BTW the content after the <h2>[4] is not important.
Thanks again!

Comment on Re^3: Extracting HTML content between the h tags Select or Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://985529]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others rifling through the Monastery: (7)

As of 2024-04-18 03:22 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found