Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

problem web xml file

by sunja (Initiate)
on Jul 01, 2012 at 20:26 UTC ( #979319=perlquestion: print w/replies, xml ) Need Help??
sunja has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm having a small problem getting an xml file properly fetched from the web. I believe it has to do with the fact that the charset isn't stated in the document, but not sure. It looks like perl is getting the file in non-ascii mode.

perl -MLWP::Simple -e "getprint ' +hp?key=FR521366R0REA16F998&part=items'"
Or as a sub in my script
use LWP::Simple; my ($html, @found, $line); my $url = ' +F998&part=items'; $html = get("$url"); unless (length($html)) { warn "Unable to load page for '$url'\n"; } my $headers = HTTP::Headers->new( "Content-type" => "text/xml", "charset" => "UTF-8"); print $headers->as_string() . "\n"; #print '<?xml version="1.0" encoding="utf-8"?>'; print $html;

Replies are listed 'Best First'.
Re: problem web xml file
by sunja (Initiate) on Jul 01, 2012 at 22:15 UTC

    hmm well I got it worked out I think. thanks for all the help!

    sub g_zip { my $file = $_; my $data; my $ptr = new IO::Uncompress::Gunzip($file) or die $!; while (defined (my $line = $ptr->getline())) { $data .= $line; } $ptr->close(); return $data; }

      There's nothing wrong with unzipping it yourself, but another option is to use LWP::UserAgent and let it handle the decoding.

      #!C:/strawberry/perl/bin/perl.exe # use strict; use warnings; use LWP::UserAgent; my $url = ' +F998&part=items'; my $ua = LWP::UserAgent->new(); my $response = $ua->get($url); if($response->is_success) { print $response->decoded_content; } else { die $response->status_line; }
Re: problem web xml file
by dasgar (Priest) on Jul 01, 2012 at 20:56 UTC

    When I tried running your one-liner, I was definitely seeing some strange ASCII characters.

    I can't give you an explanation of what's happening or why there's a difference in outcome, but the code below seemed to download the file with no problems.

    use strict; use warnings; use LWP::Simple; my $fh; my $file = "test.xml"; my $url = ' +F998&part=items'; open($fh,">",$file) or die "Unable to open file '$file': $!"; my $data = get($url); print $fh $data; close($fh);


      Learn to HTTP people, its not hard :)

      $ lwp-request -UuSsEed " +521366R0REA16F998&part=items GET +rt=items User-Agent: lwp-request/6.03 libwww-perl/6.04
      200 OK
      Connection: close
      Date: Sun, 01 Jul 2012 21:06:29 GMT
      Server: Apache/2.2.12 (Ubuntu)
      Content-Encoding: gzip
      Content-Length: 666
      Content-Type: application/xml; charset=UTF-8
      Client-Date: Sun, 01 Jul 2012 21:07:21 GMT
      Client-Response-Num: 1
      X-Compression: gzip
      X-Powered-By: PHP/5.2.10-2ubuntu6.4

        Ahh its compressed! So how would I uncompress it to print to STDOUT?

        I just really need to read and save the file so it should work now but would like to know how to deal with on cmdline.

Re: problem web xml file
by Anonymous Monk on Jul 01, 2012 at 21:03 UTC

    Hi all, I'm having a small problem ...

    What problem?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979319]
Front-paged by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2018-06-19 02:58 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.