Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: fetching and storing data from web.

by tospo (Hermit)
on Jan 27, 2012 at 09:15 UTC ( #950292=note: print w/ replies, xml ) Need Help??


in reply to Re^3: fetching and storing data from web.
in thread fetching and storing data from web.

In cases like this where you essentially need to simulate a browser visiting the website, you can also use WWW::Mechanize which does exactly that. You can even control a real web browser through Perl modules to get exactly the interaction you would have with a website when you use it through your browser (WWW::Mechanize::Firefox).


Comment on Re^4: fetching and storing data from web.
Re^5: fetching and storing data from web.
by chessgui (Scribe) on Jan 27, 2012 at 09:46 UTC
    This looks very interesting. I would be happy if I could use such high level modules for web browsing.

    However on my Win32 this is the said result of the build:
    CPAN: CPAN::SQLite loaded ok (v0.199) Running install for module 'WWW::Mechanize' Running make for J/JE/JESSE/WWW-Mechanize-1.71.tar.gz CPAN: Digest::SHA loaded ok (v5.61) CPAN: Compress::Zlib loaded ok (v2.034) Checksum for C:\strawberry\cpan\sources\authors\id\J\JE\JESSE\WWW-Mech +anize-1.71.tar.gz ok CPAN: Archive::Tar loaded ok (v1.76) CPAN: File::Temp loaded ok (v0.22) CPAN: Parse::CPAN::Meta loaded ok (v1.4401) CPAN: CPAN::Meta loaded ok (v2.110930) CPAN: YAML loaded ok (v0.73) CPAN.pm: Going to build J/JE/JESSE/WWW-Mechanize-1.71.tar.gz WWW::Mechanize likes to have a lot of test modules for some of its tes +ts. The following are modules that would be nice to have, but not required +. Test::Memory::Cycle Test::Taint Checking if your kit is complete... Looks good Writing Makefile for WWW::Mechanize Could not read metadata file. Falling back to other methods to determi +ne prerequisites CPAN: Module::CoreList loaded ok (v2.46) cp lib/WWW/Mechanize/Examples.pod blib\lib\WWW\Mechanize\Examples.pod cp lib/WWW/Mechanize/Link.pm blib\lib\WWW\Mechanize\Link.pm cp lib/WWW/Mechanize/Image.pm blib\lib\WWW\Mechanize\Image.pm cp lib/WWW/Mechanize/Cookbook.pod blib\lib\WWW\Mechanize\Cookbook.pod cp lib/WWW/Mechanize/FAQ.pod blib\lib\WWW\Mechanize\FAQ.pod cp lib/WWW/Mechanize.pm blib\lib\WWW\Mechanize.pm C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/mec +h-dump blib\script\mech-dump pl2bat.bat blib\script\mech-dump JESSE/WWW-Mechanize-1.71.tar.gz C:\strawberry\c\bin\dmake.EXE -- OK Running make test C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_h +arness(0, 'blib\lib', 'blib\arch')" t\00-load.t t\add_header.t t\alia +ses.t t\area_link.t t\autocheck.t t\clone.t t\content.t t\cookies.t t +\credentials-api.t t\credentials.t t\die.t t\field.t t\find_frame.t t +\find_image.t t\find_inputs.t t\find_link-warnings.t t\find_link.t t\ +find_link_id.t t\form-parsing.t t\form_with_fields.t t\frames.t t\ima +ge-new.t t\image-parse.t t\link-base.t t\link-relative.t t\link.t t\n +ew.t t\pod-coverage.t t\pod.t t\regex-error.t t\save_content.t t\sele +ct.t t\taint.t t\tick.t t\untaint.t t\upload.t t\warn.t t\warnings.t +t\local\back.t t\local\click.t t\local\click_button.t t\local\content +.t t\local\encoding.t t\local\failure.t t\local\follow.t t\local\form +.t t\local\get.t t\local\nonascii.t t\local\overload.t t\local\page_s +tack.t t\local\referer.t t\local\reload.t t\local\submit.t t\mech-dum +p\mech-dump.t t\00-load.t .............. ok t\add_header.t ........... ok t\aliases.t .............. ok t\area_link.t ............ ok t\autocheck.t ............ ok t\clone.t ................ ok t\content.t .............. ok t\cookies.t .............. skipped: HTTP::Server::Simple does not supp +ort Windows yet. t\credentials-api.t ...... ok t\credentials.t .......... ok t\die.t .................. ok t\field.t ................ ok t\find_frame.t ........... ok t\find_image.t ........... ok t\find_inputs.t .......... ok t\find_link-warnings.t ... ok t\find_link.t ............ ok t\find_link_id.t ......... ok t\form-parsing.t ......... ok t\form_with_fields.t ..... Dubious, test returned 1 (wstat 256, 0x100) All 8 subtests passed JESSE/WWW-Mechanize-1.71.tar.gz C:\strawberry\c\bin\dmake.EXE test -- NOT OK //hint// to see the cpan-testers results for installing this module, t +ry: reports JESSE/WWW-Mechanize-1.71.tar.gz


    This is why I want to be able to achieve my goals with the simplest possible means without relying on high level modules.

      It must be doable because both WWW::Mechanize and WWW::Mechanize::Firefox are available using ActiveState's ppm package installer. The build processing for getting modules into shape for packaging seems a little fragile so as a rule if a module is available using ppm it's not hard to install using cpan.

      Have you tried installing with force?

      True laziness is hard work
        I don't know how to install with force. The readme which comes with strawberry says that at command prompt I should type 'cpan ModuleName' to insatll a CPAN module.
        Look, chances for WWW::Mechanize on Win32 at ActiveState are not the best. Only an older version is available for an older version of Perl itself. So I'm forced to switch to an older version of Perl if I want to use this module.

        I've tried cpan help also: it suggest to use -f and -i switches together or force / fforce before the command. But the result is the same: the build not only fails but freezes.

        Actually I've switched from ActiveState to Strawberry. With Win32 I've had to get used to the fact that not all moduels are available to me. It seems there is no fix for that.
Re^5: fetching and storing data from web.
by chessgui (Scribe) on Jan 27, 2012 at 13:01 UTC
    Finally using manual methods suggested I was able to build WWW::Mechanize::Firefox on Win32 to the extent that 'use WWW::Mechanize::Firefox;' in and out of itself does not cause an error.

    However the object itself can not be created:

    use WWW::Mechanize::Firefox; open STDERR,'>>out.txt'; my $mech = WWW::Mechanize::Firefox->new(); ########################## output: Failed to connect to , problem connecting to "localhost", port 4242: N +em hozható létre kapcsolat, mert a célszámítógép már visszautasította + a kapcsolatot. at C:/strawberry/perl/site/lib/MozRepl/Client.pm line + 144
    The languge of my op. system is not english so the error message rougly means: 'connection failed because the destination computer refused to establish connection'. This message popped up during the build many times by the way. Note I've installed the necessary plugin for Firefox (mozrepl) and Firefox was running when I've received this message (I switched off the firewall but to no avail).

    Any thoughts on that?
      Is your browser running the MozRepl extension?
        I've just googled the english part of the error message and it turned out that this is a typical problem for lamers: it is not enough to install the mozrepl extension but you have to explicitly run it or set the option 'Activate on startup' :). Now I'm able to open a web page. But it is still not clear how to use this module to submit a form or login to some site. Is there any tutorial on this that you recommend?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950292]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2014-08-21 09:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (128 votes), past polls