Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

How do I Extract Data From Amazon.com's Website?

by Perl15 (Initiate)
on Aug 15, 2007 at 15:24 UTC ( #632768=perlquestion: print w/ replies, xml ) Need Help??
Perl15 has asked for the wisdom of the Perl Monks concerning the following question:

Can a perl script be created to pull specific information off of a website? For example, I need to do an Advanced Search on Amazon.com's website. I need to Search by Publisher and then pull ISBN and Description from all the listings that Publisher has. Is this possible to do in Perl? If so...how would you go about doing it? If not..does anyone have any other suggestions? I need some sort of program because this won't be a one time thing. Thanks!!

Comment on How do I Extract Data From Amazon.com's Website?
Re: How do I Extract Data From Amazon.com's Website?
by FunkyMonk (Canon) on Aug 15, 2007 at 15:28 UTC

    Net::Amazon seems to be up-to-date and comprehensive package for working with Amazon. Do any of the examples in the documentation help?

Re: How do I Extract Data From Amazon.com's Website?
by barbie (Deacon) on Aug 15, 2007 at 16:19 UTC
Re: How do I Extract Data From Amazon.com's Website?
by renodino (Curate) on Aug 15, 2007 at 17:34 UTC
    DBD::Amazon might be a solution if you're comfortable with SQL *and* your resultset isn't as massive as such a request can generate. The query engine attempts to optimize by extracting predicates that can be filtered by Amazon, and IIRC Publisher is one of those. You'll need to get an ECS account, however.

    BTW: You might want to be a bit more specific in your post's title.


    Perl Contrarian & SQL fanboy
Re: How do I Extract Data From Amazon.com's Website?
by dwm042 (Priest) on Aug 15, 2007 at 17:52 UTC
    Scraping information off web sites.. books have been written on the topic. Other authors have mentioned specifics. You can Google screen scraping for more information, and this O' Reilly book I've found helpful in many ways. Most of the book's examples are Perl code.

    Spidering Hacks
Re: How do I Extract Data From Amazon.com's Website?
by matsi (Novice) on Aug 15, 2007 at 18:43 UTC
    Respected Monks suggested several modules aimed at handling Amazon site.
    But if you want implement similar stuff yourself or probably you need to solve similar problem for another website, then you should look around LWP modules.
      If you're dealing with a web interface (i.e. HTML with forms, links, etc instead of "pure" HTTP) you're probably better off using WWW::Mechanize - it's an LWP subclass with lots of specialized methods to search and "click" through web pages/forms.

      There are even a few "clones" of WWW::Mechanize that use popular browsers at the back-end so you can deal with javascript and other client-side objects not normally supported by WWW::Mechanize.

        Recently, some folks figured out a way to use Mozilla in a mechanize driver without the need for a (visible) X-server.

        Sounds promising.

        -David

Re: How do I Extract Data From Amazon.com's Website?
by Philem (Acolyte) on Aug 15, 2007 at 19:17 UTC
    Additionally, you may want to have a look at http://samie.sourceforge.net. I've used it in the past and it's very good, once you get used to it.
Re: How do I Extract Data From Amazon.com's Website?
by eriam (Beadle) on Aug 16, 2007 at 06:48 UTC
    Hello,

    A lot of good tools has been mentionned already but I'd like to mention also Template-Extract !

    And of course POE is the way to go when it comes to networking latency exposed applications.

    Thank you

    Eriam

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://632768]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2014-10-02 17:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (68 votes), past polls