Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: fetching and storing data from web.

by chessgui (Scribe)
on Jan 27, 2012 at 08:18 UTC ( #950280=note: print w/ replies, xml ) Need Help??


in reply to Re^2: fetching and storing data from web.
in thread fetching and storing data from web.

Wget is also available for Win32 therefore I have experience with it. I have to tell you that when the fetch is complicated ( https + authentication + cookies + form posting + many redirections ) it seems more stable and business like than the LWP module. For me it was much more difficult to get it right with the LWP module than with wget. In one particular case I could not get it right ( the login was a success - I know because the cookies generated were correct - but somewhere in the chain the redirection was lost and I failed to receive the welcome login page - the same worked with wget ).

If it comes to a simple 'GET' the LWP module proved to be perfect for me (it almost never freezes, the timeout works all right).


Comment on Re^3: fetching and storing data from web.
Re^4: fetching and storing data from web.
by tospo (Hermit) on Jan 27, 2012 at 09:15 UTC
    In cases like this where you essentially need to simulate a browser visiting the website, you can also use WWW::Mechanize which does exactly that. You can even control a real web browser through Perl modules to get exactly the interaction you would have with a website when you use it through your browser (WWW::Mechanize::Firefox).
      This looks very interesting. I would be happy if I could use such high level modules for web browsing.

      However on my Win32 this is the said result of the build:
      CPAN: CPAN::SQLite loaded ok (v0.199) Running install for module 'WWW::Mechanize' Running make for J/JE/JESSE/WWW-Mechanize-1.71.tar.gz CPAN: Digest::SHA loaded ok (v5.61) CPAN: Compress::Zlib loaded ok (v2.034) Checksum for C:\strawberry\cpan\sources\authors\id\J\JE\JESSE\WWW-Mech +anize-1.71.tar.gz ok CPAN: Archive::Tar loaded ok (v1.76) CPAN: File::Temp loaded ok (v0.22) CPAN: Parse::CPAN::Meta loaded ok (v1.4401) CPAN: CPAN::Meta loaded ok (v2.110930) CPAN: YAML loaded ok (v0.73) CPAN.pm: Going to build J/JE/JESSE/WWW-Mechanize-1.71.tar.gz WWW::Mechanize likes to have a lot of test modules for some of its tes +ts. The following are modules that would be nice to have, but not required +. Test::Memory::Cycle Test::Taint Checking if your kit is complete... Looks good Writing Makefile for WWW::Mechanize Could not read metadata file. Falling back to other methods to determi +ne prerequisites CPAN: Module::CoreList loaded ok (v2.46) cp lib/WWW/Mechanize/Examples.pod blib\lib\WWW\Mechanize\Examples.pod cp lib/WWW/Mechanize/Link.pm blib\lib\WWW\Mechanize\Link.pm cp lib/WWW/Mechanize/Image.pm blib\lib\WWW\Mechanize\Image.pm cp lib/WWW/Mechanize/Cookbook.pod blib\lib\WWW\Mechanize\Cookbook.pod cp lib/WWW/Mechanize/FAQ.pod blib\lib\WWW\Mechanize\FAQ.pod cp lib/WWW/Mechanize.pm blib\lib\WWW\Mechanize.pm C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/mec +h-dump blib\script\mech-dump pl2bat.bat blib\script\mech-dump JESSE/WWW-Mechanize-1.71.tar.gz C:\strawberry\c\bin\dmake.EXE -- OK Running make test C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_h +arness(0, 'blib\lib', 'blib\arch')" t\00-load.t t\add_header.t t\alia +ses.t t\area_link.t t\autocheck.t t\clone.t t\content.t t\cookies.t t +\credentials-api.t t\credentials.t t\die.t t\field.t t\find_frame.t t +\find_image.t t\find_inputs.t t\find_link-warnings.t t\find_link.t t\ +find_link_id.t t\form-parsing.t t\form_with_fields.t t\frames.t t\ima +ge-new.t t\image-parse.t t\link-base.t t\link-relative.t t\link.t t\n +ew.t t\pod-coverage.t t\pod.t t\regex-error.t t\save_content.t t\sele +ct.t t\taint.t t\tick.t t\untaint.t t\upload.t t\warn.t t\warnings.t +t\local\back.t t\local\click.t t\local\click_button.t t\local\content +.t t\local\encoding.t t\local\failure.t t\local\follow.t t\local\form +.t t\local\get.t t\local\nonascii.t t\local\overload.t t\local\page_s +tack.t t\local\referer.t t\local\reload.t t\local\submit.t t\mech-dum +p\mech-dump.t t\00-load.t .............. ok t\add_header.t ........... ok t\aliases.t .............. ok t\area_link.t ............ ok t\autocheck.t ............ ok t\clone.t ................ ok t\content.t .............. ok t\cookies.t .............. skipped: HTTP::Server::Simple does not supp +ort Windows yet. t\credentials-api.t ...... ok t\credentials.t .......... ok t\die.t .................. ok t\field.t ................ ok t\find_frame.t ........... ok t\find_image.t ........... ok t\find_inputs.t .......... ok t\find_link-warnings.t ... ok t\find_link.t ............ ok t\find_link_id.t ......... ok t\form-parsing.t ......... ok t\form_with_fields.t ..... Dubious, test returned 1 (wstat 256, 0x100) All 8 subtests passed JESSE/WWW-Mechanize-1.71.tar.gz C:\strawberry\c\bin\dmake.EXE test -- NOT OK //hint// to see the cpan-testers results for installing this module, t +ry: reports JESSE/WWW-Mechanize-1.71.tar.gz


      This is why I want to be able to achieve my goals with the simplest possible means without relying on high level modules.

        It must be doable because both WWW::Mechanize and WWW::Mechanize::Firefox are available using ActiveState's ppm package installer. The build processing for getting modules into shape for packaging seems a little fragile so as a rule if a module is available using ppm it's not hard to install using cpan.

        Have you tried installing with force?

        True laziness is hard work
      Finally using manual methods suggested I was able to build WWW::Mechanize::Firefox on Win32 to the extent that 'use WWW::Mechanize::Firefox;' in and out of itself does not cause an error.

      However the object itself can not be created:

      use WWW::Mechanize::Firefox; open STDERR,'>>out.txt'; my $mech = WWW::Mechanize::Firefox->new(); ########################## output: Failed to connect to , problem connecting to "localhost", port 4242: N +em hozható létre kapcsolat, mert a célszámítógép már visszautasította + a kapcsolatot. at C:/strawberry/perl/site/lib/MozRepl/Client.pm line + 144
      The languge of my op. system is not english so the error message rougly means: 'connection failed because the destination computer refused to establish connection'. This message popped up during the build many times by the way. Note I've installed the necessary plugin for Firefox (mozrepl) and Firefox was running when I've received this message (I switched off the firewall but to no avail).

      Any thoughts on that?
        Is your browser running the MozRepl extension?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950280]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2014-07-26 05:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls