Thank you very much! Just tried the both approaches, it works even if the last h2-tag is missing ( appears in about 10 pages from > 400, for which I used the following workaround:
my @solution_2 = $content->findvalues( './h2[4]/preceding-sibling::*'
+);
unless ( @solution_2 )
{
@solution_2 = $content->findvalues( '//hr/preceding-sibling::*' );
}
... with substr as before ...
Fortunately they have only one hr-tag in the page :-)
With your approach it is not necessary anymore.
BTW the content after the <h2>[4] is not important.
Thanks again! |