Thanks for the your input. The odd thing was that if started all of a sudden. Maybe a change in the server configuration. As long as I get the files, I guess that's all that matters. .
Best,
Joe
| [reply] [Watch: Dir/Any] |
Sorry for reviving a 5 year old thread, but I am getting a 400 bad request from the SEC site now.
Testing downloading this file using file::Fetch and LWP
https://www.sec.gov/Archives/edgar/daily-index/2023/QTR3/form.20230712.idx
The SEC does not allow botnets or automated tools to crawl the site. Any request that has been identified as part of a botnet or an automated tool outside of the acceptable policy will be managed to ensure fair access for all users.
Please declare your user agent in request headers:
Sample Declared Bot Request Headers:
User-Agent:
Sample Company Name AdminContact@<sample company domain>.com
Accept-Encoding:
gzip, deflate
Host:
www.sec.gov
| [reply] [Watch: Dir/Any] |
| [reply] [Watch: Dir/Any] |
but try first to set the agent string as they suggest, modify this to reflect yours: User-Agent: Sample Company Name AdminContact@<sample company domain>.com . They say "does not allow" and "managed to ensure..." so you may have a chance to do it by the book.
Either way, this is how you set the agent string with File::Fetch:
use File::Fetch;
$File::Fetch::USER_AGENT = 'abc';
my $ff = File::Fetch->new(uri => 'https://dnschecker.org/user-agent-in
+fo.php');
$ff->fetch(to=>'./abc');
bw, bliako | [reply] [Watch: Dir/Any] [d/l] [select] |