Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^3: Parsing HTML

by mirod (Canon)
on Jun 12, 2012 at 11:56 UTC ( #975759=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Parsing HTML
in thread Parsing HTML

It's a bit of a pain to figure out where to look, but the as_text method comes from HTML::Element. If you look at the docs, you'll see that in addition to as_text there is also a as_trimmed_text method. I looks like you could use it.

The secon foreach loop comes from looking at the HTML source for the page. The data you want is in the p with a class of itinerari-info, in consecutive span. Some of the span's can be discarded, the ones with classes of note and strike. That's what the XPath experssion returns. Each span includes a b element with the title, which I get in $info_title, display then detach to get it out of the way. The rest of the span is the information itself.

Does this help?


Comment on Re^3: Parsing HTML
Re^4: Parsing HTML
by marcoss (Novice) on Jun 13, 2012 at 08:22 UTC

    Ok, this clarifies a lot. The as_trimmed_text worked just fine. I tried commenting the detach line, and like you said, it'll print the title twice. But then, it seems like you have seen something I completely overlooked. The strike attribute is only for dates that have been removed, that's why I didn't see it before... but still when I execute the script, the date shows up. Is it a matter of using an if statement?... Because it looks to me that the foreach my $info ( $trip->findnodes( './/p[@class="itinerari-info"]//span[@class != "note" and @class != "strike"]')) should take care of it. mmmm I'm thinking of unless but those are only assumptions... I'll let you know if I fix this, even though probably...eventually, I'll be crying out for help xD. Anyway, thank very much for your time and your patience.

    cheers!

    marcos

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://975759]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2015-07-06 00:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls