Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Extracting HTML content between the h tags

by Gangabass (Vicar)
on Aug 05, 2012 at 12:55 UTC ( #985525=note: print w/ replies, xml ) Need Help??


in reply to Extracting HTML content between the h tags

Something like this (just for first ps):

my @nodes = $p->findnodes('//h2[2]/preceding-sibling::p[preceding-sibl +ing::h2[1]]');


Comment on Re: Extracting HTML content between the h tags
Select or Download Code
Re^2: Extracting HTML content between the h tags
by vagabonding electron (Hermit) on Aug 05, 2012 at 14:22 UTC
    Thank you a lot! I did not know this syntax.
    One more question if I dare :-)
    In about 10 pages the last h2-tag is missing, so that I used the following workaround:
    my @solution_2 = $content->findvalues( './h2[4]/preceding-sibling::*' +); unless ( @solution_2 ) { @solution_2 = $content->findvalues( '//hr/preceding-sibling::*' ); }
    I tried the same with your syntax as:
    @solution_2 = $content->findvalues( '//hr/preceding-sibling::p[precedi +ng-sibling::h2[3]]' );
    but I get an uninitialized value only.
    I understood the syntax so: "search the siblings but stop if the tag in brackets appears". Is this correct? If so, what am I doing false with the above attempt?
    Spasibo!
      According to your HTML preceding-sibling for hr will be div tag but not p tag... So this code will find all ps after last h2:
      $p->findnodes('//h2[4]/following-sibling::p');
      Or (more flexible):
      $p->findnodes('//h2[last()]/following-sibling::p');
        Thank you very much!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://985525]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2015-07-07 13:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls