Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I'm trying to work out a strategy, but I'm floundering because I don't know which modules I should be looking at. I want to compare information on the same subject from a number of web sites. Some of these are javascripted, so I was intending to try to automate a browser to get to the right pages. In the initial stages at least, I was expecting to open the browser with one tab per site, navigate to the page manually (I'd like to automate that in due course) and then extract the stuff that interests me from the pages. I will want to refresh the pages at various times as the data changes. I'm paranoid about JS, and was therefore planning to use Firefox on Linux, as the risk of damage from a malicious page is reduced. However, I'm not committed to that if there's a better solution available.

I am facing several problems that I don't know how to approach. First, it's not clear to me how to go about automating Firefox. MozRepl describes MozRepl as "This module is perl interface of MozRepl", which leaves me unsure what MozRepl is or whether I need it. Also, it's version 0.06, which makes me afraid I might be trying to use something not really production ready yet. I'm also not clear how to deal with pages that are only accessible via JS. If I try bookmarking them, opening the bookmark takes me to the site's home page rather than the point I had reached.

Shopping sites seem to be able to get prices from multiple stores even when they use JS, so I believe that what I want can be done. However, I haven't found any useful documentation. If there are docs out there that cover what I want, I should be most grateful for any pointers, as well as any suggestions for a better approach.

Regards,

John Davies

In reply to Reading multiple web sites by davies

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-28 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found