Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Adding 'referer' info to spider script

by swiftone (Curate)
on Aug 08, 2003 at 18:27 UTC ( #282285=note: print w/ replies, xml ) Need Help??

in reply to Adding 'referer' info to spider script

Here are a few comments on slimming down your code while still keeping it readable (or even improving the readability) Of course, this is all My Not So Humble Opinion, so take with salt. Feel free to see this as a vast exercise in Hubris on my part.

#!/usr/bin/perl require LWP::UserAgent; require HTTP::Request; require HTTP::Response; use HTTP::Request::Common;
First, I'd recommend using perl with the -w (warn) option, and "use strict;" These can save you hours of debugging, and encourage good programming habits. At first it may seem a pain, but with a little practice they add no noticed effort, and you tend to do things a "Right Way" by default. I'd also "use" all those modules rather than "require"ing them. This imports as the module author intended, and if I disagree, I can override the authors defaults. See use for details.
foreach (@ARGV) { if ( $_ eq $ARGV[0] ) { $inputfile = $_; } elsif ( $_ eq $ARGV[1] ) { $outdir = $ARGV[1]; } else { die "Usage: $0 inputfile outdir\n"; } }
This is an unusual way of going about it. You copy the first two arguments, and die if there are more. I prefer the more succint:
die "Usage: $0 inputfile outdir\n" unless scalar @ARGV == 2; #I prefer "scalar @LIST", some prefer $#LIST, #but remember the difference my ($inputfile, $outdir) = @ARGV;
This has the advantage of working as intended (well, dieing as intended) if only one argument is given.

Just one more:

if ($filenum =~ /\d\d\d\d/) {$filenum = $filenum; } elsif ($filenum =~ /\d\d\d/) {$filenum = "0$filenum"; } elsif ($filenum =~ /\d\d/) {$filenum = "00$filenum"; } else {$filenum = "000$filenum"; }
How about:
$filenum = sprintf("%04d", $filenum);

Comment on Re: Adding 'referer' info to spider script
Select or Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://282285]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2014-10-01 23:27 GMT
Find Nodes?
    Voting Booth?

    What is your favourite meta-syntactic variable name?

    Results (41 votes), past polls