http://www.perlmonks.org?node_id=1030177


in reply to get ?

I'm not entirely certain you are providing a valid URI. Without a particular example, have you tried escaping the file name, e.g.

use URI::Escape; use LWP::Simple 'get'; my $file = uri_escape('xml/myfile.xml'); get("http://www.example.com/FileServe?file=$file");
Have you verified the link, as typed, can be accessed via other channels, like your browser?

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^2: get ?
by rebugger (Acolyte) on Apr 23, 2013 at 16:57 UTC

    Yes, the link works in my browser, I failed to mention that. I've cleared my cache to double check I'm not looking at a cached version on my browser. I have checked that the string I am inputting into get is correct, too. None of the file names have spaces or any other characters that require escapes.

    Okay, new discovery: I can get files with question marks in the path from other websites, for example this node. Starting to think it likely has nothing to do with the URL (URI? My terminology is a bit hazy) and has everything to do with the particular website I am trying to access. Maybe it is blocking access from non-browsers. I should probably pursue this internally now, thank you for the help.

      I suspect your issue is that you have a slash in your query string, and that requires an escape. What happens when you try to grab http://www.example.com/FileServe?file=xml%2Fmyfile.xml, i.e. run the example code I gave?

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        It does not require an escape in the browser's address bar, but it could always be doing some kind of "smart" conversion to hide the inner workings from me. I'll try your example code with my query string...

        No, it still doesn't work. Both the escaped and non-escaped versions work in the browser, but neither work from the get call. I can download other pages using get, and it's not having problems with other 'query' URIs with slashes in the query part of them, so I am thinking it is something on the server end.

        After some further testing, nothing from that domain can be retrieved by the get command. This is a company-owned website, so I will try contacting the admin, see if there is any workaround.