Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Advatages of LWP?

by ChrisJ-UK (Novice)
on Jul 28, 2004 at 12:58 UTC ( [id://378018]=perlquestion: print w/replies, xml ) Need Help??

ChrisJ-UK has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am still building my little crawler.

At the moment I retrieve each page using the Lynx browser:

sub fetch_a_page { my($fetch_this)=@_; # Holds URL of page to be fetched my($fetched_page); # Holds contents of fetched page $fetched_page=`lynx -source $fetch_this`; return $fetched_page; }

I am aware of the LWP module being able to fetch web pages. Is there any actual advantage to be had from using LWP rather than Lynx?

If I shifted the script from the current Unix server to Windows then I would have to include LWP but I don't forsee that ever happening.

Thanks.

Replies are listed 'Best First'.
Re: Advatages of LWP?
by davorg (Chancellor) on Jul 28, 2004 at 13:06 UTC

    Some that spring immediately to mind:

    1. LWP is a standard part of Perl (since 5.8.0) therefore it will always be installed whereas lynx is something extra to install.
    2. Using internal code is always safer and faster than shelling out to an external program.
    3. LWP can be used to give you more information - like the HTTP response header - which might be useful in the future.
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      LWP is a standard part of Perl

      Hrm? I don't see it.

      The standard dist needs to get smaller, anyway. Although very useful, LWP is not such a critical peice of functionality that it needs to be in the core.

      ----
      send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.

        Er.... no. It was libnet that was added, not libwww. Sorry about that.

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re: Advatages of LWP?
by hardburn (Abbot) on Jul 28, 2004 at 13:13 UTC

    Yes:

    • Calling external programs is always a security risk, especially without setting $ENV{PATH}. (It's up to you to balance this with other concerns).
    • LWP will let you get at the HTTP headers
    • It's easier to handle errors in LWP

    If you're willing to take the security risk, and don't need to do anything with HTTP headers, then you're still stuck with error handling. What if the page isn't available? What if your connection is down?

    What happens if Lynx isn't available? You mention porting to Windows. Lynx is just as unlikely to be there as LWP.

    ----
    send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.

Re: Advatages of LWP?
by nite_man (Deacon) on Jul 28, 2004 at 13:49 UTC

    Sure, you can use external programms directly in many cases instead of using modules. But in my mind, using of modules is better because there are many things which already done: error catching, additional functionality some validations etc and many people test they functionality.

    In your case, you should do the same things (testing, error catching etc) in your application. So, no reasong do the same work.

    ---
    Schiller

    It's only my opinion and it doesn't have pretensions of absoluteness!

Re: Advatages of LWP?
by iburrell (Chaplain) on Jul 28, 2004 at 16:27 UTC
    It is more efficient to use the Perl modules than the external programs. LWP requires loading some Perl modules. But it saves forking and running another program. For a single page, it probably doesn't matter. For multiple pages, the difference could be huge.
Re: Advatages of LWP?
by Aristotle (Chancellor) on Jul 28, 2004 at 16:27 UTC

    My reason would be that lynx is really slow. If any external tool, I'd use wget -qO - to pull pages (or maybe curl if you have that; it's less common than wget).

    That said, I can't think of any particular reason in favour of using an external utility. I'd just use LWP::Simple; and call get(), which is one extra line to pull in the module and otherwise (at least) as simple.

    A reason against using backticks in general is security; in your case, if $fetch_this ever contains input from untrusted users.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://378018]
Approved by gellyfish
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-26 09:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found