http://www.perlmonks.org?node_id=874315

fx has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

A strange problem has arisen today. Most strange. I have the following code:

my $res = $ua->get( $url, ':content_file' => $filename );

being used under LWP::UserAgent. The returned content, stored in the file $filename, seems to contain the HTTP headers as well as the content - I was expecting just the content. So, for example, if $url references a JPG on the Internet, I don't get a readable JPG back - I seem to get a full HTTP response.

The code, under Windows, works fine. Ran on a Linux box, broken. Now that is odd!

Same issue if getstore(...) is used from LWP::Simple instead of the full UserAgent... Same issue if I store the Response in a variable and use/print it. No matter what I do, I get these unwanted HTTP header lines at the top of the file/response which means the file returned by LWP is actually useless with manual modification...

Opening the resulting file in an editor I seem to see a ^M at the end of the lines - certainly in the unwanted header-looking stuff. Could this be something significant? If so, why is LWP putting it there? Don't worry about this bit anymore ;)

This is a brand new build of a Fedora 14 desktop. Windows was using an older build of ActiveState.

The docs say that I should ONLY get the content into $filename - so why are those HTTP headers there too!.... :(

**UPDATE** A quick review of my code shows I am calling the "get" or "getstore" from within a thread. Outside of threading, files are retrieved as expected. In a threaded environment, the data stored contains these unwanted HTTP headers. Question still remains though......why?

fx, Infinity is Colourless

Replies are listed 'Best First'.
Re: LWP UserAgent and Simple keeping headers in content
by ikegami (Patriarch) on Nov 29, 2010 at 22:43 UTC

    Works fine for me on linux with 5.837:

    $ perl -MLWP::UserAgent -e'LWP::UserAgent->new->get("http://www.raptor +recoverynebr.org/Imm.%20Snowy%20Owl.jpg", ":content_file" => "image.j +pg");' {17} eric@fmdev10 [~/tmp]$ od -c image.jpg | head 0000000 377 330 377 340 \0 020 J F I F \0 001 001 001 002 +X 0000020 002 X \0 \0 377 333 \0 C \0 005 003 004 004 004 003 00 +5 0000040 004 004 004 005 005 005 006 \a \f \b \a \a \a \a 017 \ +v 0000060 \v \t \f 021 017 022 022 021 017 021 021 023 026 034 027 02 +3 0000100 024 032 025 021 021 030 ! 030 032 035 035 037 037 037 023 02 +7 0000120 " $ " 036 $ 034 036 037 036 377 333 \0 C 001 005 00 +5 0000140 005 \a 006 \a 016 \b \b 016 036 024 021 024 036 036 036 03 +6 0000160 036 036 036 036 036 036 036 036 036 036 036 036 036 036 036 03 +6 * 0000220 036 036 036 036 036 036 036 036 036 036 036 036 036 036 377 30 +0 $ perl -MLWP::Simple -e'getstore("http://www.raptorrecoverynebr.org/Im +m.%20Snowy%20Owl.jpg", "image.jpg");' $ od -c image.jpg | head 0000000 377 330 377 340 \0 020 J F I F \0 001 001 001 002 +X 0000020 002 X \0 \0 377 333 \0 C \0 005 003 004 004 004 003 00 +5 0000040 004 004 004 005 005 005 006 \a \f \b \a \a \a \a 017 \ +v 0000060 \v \t \f 021 017 022 022 021 017 021 021 023 026 034 027 02 +3 0000100 024 032 025 021 021 030 ! 030 032 035 035 037 037 037 023 02 +7 0000120 " $ " 036 $ 034 036 037 036 377 333 \0 C 001 005 00 +5 0000140 005 \a 006 \a 016 \b \b 016 036 024 021 024 036 036 036 03 +6 0000160 036 036 036 036 036 036 036 036 036 036 036 036 036 036 036 03 +6 * 0000220 036 036 036 036 036 036 036 036 036 036 036 036 036 036 377 30 +0

    Could you disclose the URL in question? And perhaps the actual code you used?

    Opening the resulting file in an editor I seem to see a ^M at the end of the lines

    HTTP headers end with CR LF. On Windows, that's the standard text file line ending. On unix, LF is the standard text file line ending, so the "^M" is your editor trying to represent the CR.

      Ah....potentially more info from me now!

      Just tried a simple command line "get" and "getstore" and it all works ok. Just re-examined my code and I'm actually calling my "get" or "getstore" from within a thread.

      Now I know threads can cause issues, but is this connected somehow? The "get" is just about working...but not quite completely working. The program isn't completely crashing out or anything...it's just not doing exactly what I think it should.

      Could there be any specific reason why running a UserAgent "get" inside a thread causes it to store slightly different data compared to calling when not threading?....

      fx, Infinity is Colourless

        Don't make me guess; provide code to produce.
Re: LWP UserAgent and Simple keeping headers in content
by Anonymous Monk on Nov 29, 2010 at 18:55 UTC

      I'm not at that workstation right now but http://cpansearch.perl.org/src/GAAS/libwww-perl-5.837/Changes says this was released 2010-09-20. I did a "install LWP" via CPAN only today. Surely I must have the most recent version....

      I also tried scanning http://cpansearch.perl.org/src/GAAS/libwww-perl-5.837/Changes for my particular bug and didn't see anything. Is this a specific bug I am encountering?...

      fx, Infinity is Colourless

        Surely I must have the most recent version....

        Anything could happen

        perl -MLWP -e" die LWP->VERSION"
        Is this a specific bug I am encountering?

        I don't think so. Maybe you're behind a broken proxy.

Re: LWP UserAgent and Simple keeping headers in content
by fx (Pilgrim) on Dec 01, 2010 at 17:34 UTC

    Many thanks to all those who tried to help with this. In the end I haven't been able to reproduce the error with some basic code and as I can't really disclose the full code I'm going to have to leave it there.

    I have worked around this by simply system()'ing out to wget to perform the download and that works fine. Probably more expensive on system resources but this isn't a performance critical app so that'll have to do!

    fx, Infinity is Colourless