Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: Extracting HTML content between the h tags

by vagabonding electron (Chaplain)
on Aug 05, 2012 at 14:09 UTC ( #985529=note: print w/replies, xml ) Need Help??


in reply to Re^2: Extracting HTML content between the h tags
in thread Extracting HTML content between the h tags

Thank you very much!
Just tried the both approaches, it works even if the last h2-tag is missing ( appears in about 10 pages from > 400, for which I used the following workaround:
my @solution_2 = $content->findvalues( './h2[4]/preceding-sibling::*' +); unless ( @solution_2 ) { @solution_2 = $content->findvalues( '//hr/preceding-sibling::*' ); }
... with substr as before ...
Fortunately they have only one hr-tag in the page :-)
With your approach it is not necessary anymore.
BTW the content after the <h2>[4] is not important.
Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://985529]
help
Chatterbox?
[LanX]: ... so my boss started a project with the newest sun servers and invited the traders to come on weekend to test it... and they were so pleased, that they forced him to keep it in production...
[ambrus]: Corion: sure, this is the long-term plan. The short term is that I have to run this ungodly mess to get results from the new input data today.
[Corion]: ambrus: Most of our "automation" is tied to process exit codes and a shell pipeline :-\
[LanX]: ... a week later they realized that one of the databases - which recorded how much the other banks due to this bank - was not correctly plugged
[ambrus]: Corion: I have no problem with exit codes and shell pipeline. My problem is that the current process requires a lot of manual intervention from me, including editing the source codes.
[ambrus]: (Also a lot of manual intervention by two or three other co-workers, who do other parts of the process.)
[ambrus]: Some of the manual part is unavoidable, but not all.
[choroba]: LanX was there a way to recover the numbers from the remaining information?
[Corion]: LanX: Ow ;)

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (12)
As of 2017-03-29 11:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (350 votes). Check out past polls.