Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Screen scraping complex tables and divs (updated)

by parser (Acolyte)
on Oct 13, 2017 at 21:42 UTC ( [id://1201352]=note: print w/replies, xml ) Need Help??


in reply to Re: Screen scraping complex tables and divs (updated)
in thread Screen scraping complex tables and divs

Rolf,

I am confused now too. Are you saying WWW::Mechanize supports CSS selector and XPath? Or that WWW::Mechanize::Firefox does? If the latter, I also read it was very difficult to build.

Query syntax is not a Perl question, but there are plenty of good tutorials online.

I agree. However, determining how best to query HTML source via Perl is.

The option of mirroring the DOM into a Perl/XML data structure and using the query API sounds quite good. I'll give that a go and see how it works. Anything is better than parsing table tags with TokParser.
  • Comment on Re^2: Screen scraping complex tables and divs (updated)

Replies are listed 'Best First'.
Re^3: Screen scraping complex tables and divs (updated)
by Corion (Patriarch) on Oct 14, 2017 at 06:44 UTC
Re^3: Screen scraping complex tables and divs
by LanX (Saint) on Oct 13, 2017 at 21:53 UTC
    WWW::Mechanize::Firefox does and I took it as an example out of many because I worked with it in the past.

    But it really depends if you need JS or not, so I don't want to go into details.

    Querying Html was your question, something like xpath or css selector is mostly the solution.

    Regarding the Perl backend: it depends.

    Sorry there is no generic answer for TIMTOWTDI .

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re^3: Screen scraping complex tables and divs (Firepath)
by LanX (Saint) on Oct 14, 2017 at 22:43 UTC
    PS:

    > > Look out for browser features/addons allowing to play around with queries.

    I had very good experience using Firepath to find the right CSS selectors / XPath expressions inside Firefox.

    You can copy an auto-generated explicit expression by right clicking on a DOM-element and change them interactively.

    Simply copy the final path and/or selector into your Perl code then.

    HTH! :)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

      Good catch! Firepath is saving me much time!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1201352]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (1)
As of 2025-01-18 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (56 votes). Check out past polls.