Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Extracting HTML content between the h tags

by Anonymous Monk
on Aug 05, 2012 at 12:53 UTC ( #985524=note: print w/ replies, xml ) Need Help??


in reply to Re: Extracting HTML content between the h tags
in thread Extracting HTML content between the h tags

The   "Comment." key stuck out, so a better idea might be to use the @id attribute as key

my $key = shift(@nodes)->findvalue('*[@id]/@id');


Comment on Re^2: Extracting HTML content between the h tags
Select or Download Code
Replies are listed 'Best First'.
Re^3: Extracting HTML content between the h tags
by vagabonding electron (Hermit) on Aug 05, 2012 at 14:09 UTC
    Thank you very much!
    Just tried the both approaches, it works even if the last h2-tag is missing ( appears in about 10 pages from > 400, for which I used the following workaround:
    my @solution_2 = $content->findvalues( './h2[4]/preceding-sibling::*' +); unless ( @solution_2 ) { @solution_2 = $content->findvalues( '//hr/preceding-sibling::*' ); }
    ... with substr as before ...
    Fortunately they have only one hr-tag in the page :-)
    With your approach it is not necessary anymore.
    BTW the content after the <h2>[4] is not important.
    Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://985524]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (11)
As of 2015-07-28 11:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (254 votes), past polls