Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

problem web xml file

by sunja (Initiate)
on Jul 01, 2012 at 20:26 UTC ( #979319=perlquestion: print w/ replies, xml ) Need Help??
sunja has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm having a small problem getting an xml file properly fetched from the web. I believe it has to do with the fact that the charset isn't stated in the document, but not sure. It looks like perl is getting the file in non-ascii mode.

perl -MLWP::Simple -e "getprint ' +hp?key=FR521366R0REA16F998&part=items'"
Or as a sub in my script
use LWP::Simple; my ($html, @found, $line); my $url = ' +F998&part=items'; $html = get("$url"); unless (length($html)) { warn "Unable to load page for '$url'\n"; } my $headers = HTTP::Headers->new( "Content-type" => "text/xml", "charset" => "UTF-8"); print $headers->as_string() . "\n"; #print '<?xml version="1.0" encoding="utf-8"?>'; print $html;

Comment on problem web xml file
Select or Download Code
Re: problem web xml file
by dasgar (Curate) on Jul 01, 2012 at 20:56 UTC

    When I tried running your one-liner, I was definitely seeing some strange ASCII characters.

    I can't give you an explanation of what's happening or why there's a difference in outcome, but the code below seemed to download the file with no problems.

    use strict; use warnings; use LWP::Simple; my $fh; my $file = "test.xml"; my $url = ' +F998&part=items'; open($fh,">",$file) or die "Unable to open file '$file': $!"; my $data = get($url); print $fh $data; close($fh);


      Learn to HTTP people, its not hard :)

      $ lwp-request -UuSsEed " +521366R0REA16F998&part=items GET +rt=items User-Agent: lwp-request/6.03 libwww-perl/6.04
      200 OK
      Connection: close
      Date: Sun, 01 Jul 2012 21:06:29 GMT
      Server: Apache/2.2.12 (Ubuntu)
      Content-Encoding: gzip
      Content-Length: 666
      Content-Type: application/xml; charset=UTF-8
      Client-Date: Sun, 01 Jul 2012 21:07:21 GMT
      Client-Response-Num: 1
      X-Compression: gzip
      X-Powered-By: PHP/5.2.10-2ubuntu6.4

        Ahh its compressed! So how would I uncompress it to print to STDOUT?

        I just really need to read and save the file so it should work now but would like to know how to deal with on cmdline.

Re: problem web xml file
by Anonymous Monk on Jul 01, 2012 at 21:03 UTC

    Hi all, I'm having a small problem ...

    What problem?

Re: problem web xml file
by sunja (Initiate) on Jul 01, 2012 at 22:15 UTC

    hmm well I got it worked out I think. thanks for all the help!

    sub g_zip { my $file = $_; my $data; my $ptr = new IO::Uncompress::Gunzip($file) or die $!; while (defined (my $line = $ptr->getline())) { $data .= $line; } $ptr->close(); return $data; }

      There's nothing wrong with unzipping it yourself, but another option is to use LWP::UserAgent and let it handle the decoding.

      #!C:/strawberry/perl/bin/perl.exe # use strict; use warnings; use LWP::UserAgent; my $url = ' +F998&part=items'; my $ua = LWP::UserAgent->new(); my $response = $ua->get($url); if($response->is_success) { print $response->decoded_content; } else { die $response->status_line; }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979319]
Front-paged by Arunbear
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2015-06-03 05:00 GMT
Find Nodes?
    Voting Booth?

    What kind of chocolate gives you the most pleasure?

    Results (118 votes), past polls