Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: HTTP response: 400 Bad Request

by Anonymous Monk
on Feb 01, 2016 at 03:29 UTC ( [id://1154157]=note: print w/replies, xml ) Need Help??


in reply to HTTP response: 400 Bad Request

lots of stupid websies return 400 or 500 based silly headers they don't like like user agent, etc ... examine the headers

Replies are listed 'Best First'.
Re^2: HTTP response: 400 Bad Request
by JoeJohnston (Novice) on Feb 01, 2016 at 05:04 UTC

    Thanks for the your input. The odd thing was that if started all of a sudden. Maybe a change in the server configuration. As long as I get the files, I guess that's all that matters.

    .

    Best,

    Joe

      Sorry for reviving a 5 year old thread, but I am getting a 400 bad request from the SEC site now. Testing downloading this file using file::Fetch and LWP https://www.sec.gov/Archives/edgar/daily-index/2023/QTR3/form.20230712.idx

      The SEC does not allow botnets or automated tools to crawl the site. Any request that has been identified as part of a botnet or an automated tool outside of the acceptable policy will be managed to ensure fair access for all users. Please declare your user agent in request headers: Sample Declared Bot Request Headers: User-Agent: Sample Company Name AdminContact@<sample company domain>.com Accept-Encoding: gzip, deflate Host: www.sec.gov

        I believe you'll need to set a custom user agent header (with a contact email as it shows) rather than using LWP's default. If you do that I think it'll let you through (I want to say had to do something similarly once for something from the Treasury).

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

        but try first to set the agent string as they suggest, modify this to reflect yours: User-Agent: Sample Company Name AdminContact@<sample company domain>.com . They say "does not allow" and "managed to ensure..." so you may have a chance to do it by the book.

        Either way, this is how you set the agent string with File::Fetch:

        use File::Fetch; $File::Fetch::USER_AGENT = 'abc'; my $ff = File::Fetch->new(uri => 'https://dnschecker.org/user-agent-in +fo.php'); $ff->fetch(to=>'./abc');

        bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1154157]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-03-29 05:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found