Re^2: Scrappy Module


Your skill will accomplish what the force of many cannot
	PerlMonks

Re^2: Scrappy Module

by sankarg (Initiate)

on May 12, 2011 at 11:35 UTC ( [id://904366]=note: print w/replies, xml )

Need Help??

in reply to Re: Scrappy Module
in thread Scrappy Module

Thanks for your immediate reply marto. I already worked with scrappy module. I can able to get the content when scrapping a website. My question is in the latest version of scrappy

 use Scrappy;

    my  $scraper = Scrappy->new;
    
        $scraper->crawl('http://search.cpan.org/recent',
            '/recent' => {
                '#cpansearch li a' => sub {
                    print $_[1]->{href}, "\n";
                }
            }
        );
[download]

you can find that this the url 'http://search.cpan.org/recent' means we need to give the 'recent' tag. it is working only for this cpan site. And it is not working for other sites. That is my question. How we could use the tags and get scrape a website. Can you able to understand.

Comment on Re^2: Scrappy Module Download Code

Replies are listed 'Best First'.

Re^3: Scrappy Module
by marto (Cardinal) on May 12, 2011 at 11:48 UTC

I've never used this module, if you look at the source for the URL you provide in your example along with the documentation (see the crawl method), unless other sites contain the same '/recent' link and element with and id of 'cpansearch' etc, it's not going to work. In other words, you need to write your own code to work with your own sites. Other parsing modules are available, see WWW::Mechanize::Firefox among others.

[reply]
[d/l]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://904366]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others having an uproarious good time at the Monastery: (5)

As of 2024-04-24 09:11 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found