Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Web::Magic 0.005

by tobyink (Canon)
on Jan 12, 2012 at 14:00 UTC ( [id://947540]=CUFP: print w/replies, xml ) Need Help??

Web::Magic awesomeness...

#!/usr/bin/perl use URI; use Web::Magic 0.005 -quotelike => 'web'; # Newest questions on PerlMonks.org printf( "%s\n<%s>\n\n", $_->textContent, URI->new_abs($_->getAttribute('href'), 'http://www.perlmonks.org/' +), ) foreach web <http://www.perlmonks.org/?node=Newest%20Nodes> -> assert_success -> querySelector('h3 a[name="toc-Questions"]') -> parentNode -> nextSibling -> querySelectorAll('tr td a[title]');

Replies are listed 'Best First'.
Re: Web::Magic 0.005
by Anonymous Monk on Jan 12, 2012 at 14:47 UTC

    yuck, "web quotes", and here I thought Jack Bauer invented misery. :p

    IMHO, if I have to specify url twice, I'd rather specify it as base than muck with URI, see App::Scrape#SYNOPSIS for dang near identical example

      App::scrape is more limited - it just uses CSS selectors to build up a Perl data structure from an HTML page. Handy yes, but Web::Magic does much more than that.

      Can App::scrape handle YAML seamlessly?

      use Web::Magic -sub=>'web'; say web('http://www.cpantesters.org/distro/W/Web-Magic.yaml') ->[0]{guid};

      Or feeds?

      use Web::Magic -sub=>'web'; say $_->title foreach web('http://www.w3.org/News/atom.xml')->entries;

      Or for that matter JSON, RDF, arbitrary XML, etc?

      And how about POST requests?

      use 5.010; use Web::Magic -sub => 'web'; # Paste to paste2.org, and say the URL it was pasted to say web('http://paste2.org/new-paste') ->POST({ code => 'say "Hello world";', lang => 'perl', description => 'Perl Hello World', parent => 0, submit => 'Submit', }) ->Content_Type('application/x-www-form-urlencoded') ->header('Location');

        **thread bump**

        App::scrape is more limited - it just uses CSS selectors to build up a Perl data structure from an HTML page.

        It uses css and xpath, but yes, it is slightly simpler

        Can App::scrape handle YAML seamlessly?

        No, but I'm sure it could , in about five lines :) Tree::XPathEngine, its on CPAN :)

        Or for that matter JSON, RDF, arbitrary XML, etc?

        It does support RDF.

        And how about POST requests?

        Sure, its right there in the SYNOPSIS  use LWP::Simple qw(get);, you an just as easily write  use LWP::Simple qw( $ua ); and use  $ua->POST(...)

        I recognize that it does a lot more, and a large number of the prereqs are your modules -- that is a lot of work -- but why?

        Web::Magic won't help me "fake" a proper ua_string like WWW::Mechanize , and it has all those exceptions, but no cookie jar?

        Magic? Dwimmery? Awesomness? -- yes, I like kung-fu panda too :)

        HTML::Query, Web::Query, Web::Scraper, Web::Magic ... a lot of the same kind of work, which horse to choose?

        Sell me a horse?

        I'm sure you have philosophy, reasons for doing things your way, a big and little picture.... I'd love to know what it is :) I just don't have a grasp of the thing.

        Maybe its because i'm not a "24" fan ? What can I say, Kiefer Sutherland grates me worse than David Caruso :)

        Can you enlighten me?

Re: Web::Magic 0.005
by Anonymous Monk on Jan 14, 2012 at 00:29 UTC
    Not too impressed. You really should have a disclaimer that you're the author of the module you're pimping. And a link to it on CPAN.

      Not too impressed. You really should have a disclaimer that you're the author of the module you're pimping. And a link to it on CPAN.

      A link would have been nice, but a disclaimer?

      It is not hard to notice tobyink is http://search.cpan.org/~tobyink/Web-Magic

Re: Web::Magic 0.005
by Anonymous Monk on Jan 14, 2012 at 07:25 UTC

    Here is what I would like to see added :)

     use Web::Magic qw/ -mech /;

    This gets me  $ua or  $wm, a nice best impersonation of a WWW::Mechanize type LWP::UserAgent subclass :) autocheck option, agent_alias (and WWW::UserAgent::Random ), automatic redirection handling, nice page history...

    $wm ->add_handler ...; $wm ->timeout( 1 ); $wm ->back ; $wm ->reload ; ...

     $wm -> get is get, but  $wm -> GET is your kind of get

    What do you think?

Re: Web::Magic 0.005
by jdrago999 (Pilgrim) on Jan 29, 2012 at 22:34 UTC

    This is awesome.

    The selectors are where it's at, and the sane method names used provide for easy access to the elements you want to find.

    No need to worry about dependencies - we've got this great "CPAN" thing that Just Works.

    Now if we could have a better testing DSL like Capybara has...

      Nice, thanks for this module Toby. Seamless JSON support is very welcome.

      jdrago999, Brownie is a Capybara-like test framework in Perl. I'm going to give it a try soon. There is a presentation here, where the author says s.he welcomes contributors.

      Not that I disagree with you, but I wouldn't want to write tests like that, even with a DSL -- I'd rather fireup HTTP::Proxy or selenium or WWW::Mechanize::Firefox, and record a session in my firefox

        Not that I disagree with you, but I wouldn't want to write tests like that, even with a DSL -- I'd rather fireup HTTP::Proxy or selenium or WWW::Mechanize::Firefox, and record a session in my firefox

        It has been a while since I did that kind of testing...is HTTP::Recorder still the state-of-the-art or have things moved forward in the last 6 years?

        UPDATE: I just noticed there is a newer release as of July 2011.

Re: Web::Magic 0.005
by Anonymous Monk on Jan 15, 2012 at 05:12 UTC
    This module might be great but way to many dependencies. I recommend checking out Mojo::UserAgent. No dependencies.

      This module might be great but way to many dependencies. I recommend checking out Mojo::UserAgent. No dependencies.

      It also has no features -- at least none of the features of Web::Magic -- great argument, very persuasive

        The dependencies kill this module... According cpantesters, this modules has a 42% chance of all tests passing. As for Mojo::UserAgent, it supports DOM, CSS selectors, and JSON. http://mojocasts.com/e5.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://947540]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-20 01:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found