Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Adding 'referer' info to spider script

by swiftone (Curate)
on Aug 08, 2003 at 18:27 UTC ( [id://282285]=note: print w/replies, xml ) Need Help??


in reply to Adding 'referer' info to spider script

Here are a few comments on slimming down your code while still keeping it readable (or even improving the readability) Of course, this is all My Not So Humble Opinion, so take with salt. Feel free to see this as a vast exercise in Hubris on my part.

#!/usr/bin/perl require LWP::UserAgent; require HTTP::Request; require HTTP::Response; use HTTP::Request::Common;
First, I'd recommend using perl with the -w (warn) option, and "use strict;" These can save you hours of debugging, and encourage good programming habits. At first it may seem a pain, but with a little practice they add no noticed effort, and you tend to do things a "Right Way" by default. I'd also "use" all those modules rather than "require"ing them. This imports as the module author intended, and if I disagree, I can override the authors defaults. See use for details.
foreach (@ARGV) { if ( $_ eq $ARGV[0] ) { $inputfile = $_; } elsif ( $_ eq $ARGV[1] ) { $outdir = $ARGV[1]; } else { die "Usage: $0 inputfile outdir\n"; } }
This is an unusual way of going about it. You copy the first two arguments, and die if there are more. I prefer the more succint:
die "Usage: $0 inputfile outdir\n" unless scalar @ARGV == 2; #I prefer "scalar @LIST", some prefer $#LIST, #but remember the difference my ($inputfile, $outdir) = @ARGV;
This has the advantage of working as intended (well, dieing as intended) if only one argument is given.

Just one more:

if ($filenum =~ /\d\d\d\d/) {$filenum = $filenum; } elsif ($filenum =~ /\d\d\d/) {$filenum = "0$filenum"; } elsif ($filenum =~ /\d\d/) {$filenum = "00$filenum"; } else {$filenum = "000$filenum"; }
How about:
$filenum = sprintf("%04d", $filenum);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://282285]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-03-29 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found