Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

How to write CSS selector to extract more than one value from html source using scrappy module?

by shivanisai (Initiate)
on May 16, 2011 at 11:55 UTC ( #905050=perlquestion: print w/ replies, xml ) Need Help??
shivanisai has asked for the wisdom of the Perl Monks concerning the following question:

Look at the following html source
<div><p><a href="http://www.somesite.com.br/site/lojavirtual/produtos. +asp?id=2507 "><img alt="ESPELHO RETROVISOR - S00224 - SAFETY" src="http://www.some +site.com.br /site/lojavirtual/produtos/2507/peq.jpg" /> </a></div>
If I write css selector for this html source as
$scraper2->select('div p a')->data;

We can extract the {href} value of tag. But I need a single CSS selector to extract both href value and <img> src value.How can we write the selector? or could you give any sites to refer to write the CSS selectors efficiently?

Comment on How to write CSS selector to extract more than one value from html source using scrappy module?
Select or Download Code
Re: How to write CSS selector to extract more than one value from html source using scrappy module?
by Corion (Pope) on May 16, 2011 at 12:03 UTC

    CSS selectors cannot extract attributes.

    You can try to extract the node and the child node in two passes. It seems that Scrappy uses Web::Scraper, so maybe learning about how to do things using Web::Scraper will help you.

    I would guess that the ->focus method will allow you to select a node and its child nodes, and then you can select the link together with the img tag.

Re: How to write CSS selector to extract more than one value from html source using scrappy module?
by Anonymous Monk on May 16, 2011 at 12:05 UTC
    But I need a single CSS selector to extract both href

    No, you absolutely do not need a single CSS selector

      Based on the Scrappy synopsis you might use
      $scraper->crawl( 'http://www.example.com/page', '/page' => { 'div p a' => sub { print $_[1]->{href}, "\n"; }, 'div p img' => sub { print $_[1]->{src}, "\n"; } } );
      the selectors are made in turn, not that useful

      Scrappy::Scraper::Parser further convinces me Scrappy has too much Pee.

      Pure Web::Scraper looks simpler to manage

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://905050]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-12-27 00:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls