Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

problem web xml file

by sunja (Initiate)
on Jul 01, 2012 at 20:26 UTC ( #979319=perlquestion: print w/replies, xml ) Need Help??
sunja has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm having a small problem getting an xml file properly fetched from the web. I believe it has to do with the fact that the charset isn't stated in the document, but not sure. It looks like perl is getting the file in non-ascii mode.

perl -MLWP::Simple -e "getprint ' +hp?key=FR521366R0REA16F998&part=items'"
Or as a sub in my script
use LWP::Simple; my ($html, @found, $line); my $url = ' +F998&part=items'; $html = get("$url"); unless (length($html)) { warn "Unable to load page for '$url'\n"; } my $headers = HTTP::Headers->new( "Content-type" => "text/xml", "charset" => "UTF-8"); print $headers->as_string() . "\n"; #print '<?xml version="1.0" encoding="utf-8"?>'; print $html;

Replies are listed 'Best First'.
Re: problem web xml file
by sunja (Initiate) on Jul 01, 2012 at 22:15 UTC

    hmm well I got it worked out I think. thanks for all the help!

    sub g_zip { my $file = $_; my $data; my $ptr = new IO::Uncompress::Gunzip($file) or die $!; while (defined (my $line = $ptr->getline())) { $data .= $line; } $ptr->close(); return $data; }

      There's nothing wrong with unzipping it yourself, but another option is to use LWP::UserAgent and let it handle the decoding.

      #!C:/strawberry/perl/bin/perl.exe # use strict; use warnings; use LWP::UserAgent; my $url = ' +F998&part=items'; my $ua = LWP::UserAgent->new(); my $response = $ua->get($url); if($response->is_success) { print $response->decoded_content; } else { die $response->status_line; }
Re: problem web xml file
by dasgar (Curate) on Jul 01, 2012 at 20:56 UTC

    When I tried running your one-liner, I was definitely seeing some strange ASCII characters.

    I can't give you an explanation of what's happening or why there's a difference in outcome, but the code below seemed to download the file with no problems.

    use strict; use warnings; use LWP::Simple; my $fh; my $file = "test.xml"; my $url = ' +F998&part=items'; open($fh,">",$file) or die "Unable to open file '$file': $!"; my $data = get($url); print $fh $data; close($fh);


      Learn to HTTP people, its not hard :)

      $ lwp-request -UuSsEed " +521366R0REA16F998&part=items GET +rt=items User-Agent: lwp-request/6.03 libwww-perl/6.04
      200 OK
      Connection: close
      Date: Sun, 01 Jul 2012 21:06:29 GMT
      Server: Apache/2.2.12 (Ubuntu)
      Content-Encoding: gzip
      Content-Length: 666
      Content-Type: application/xml; charset=UTF-8
      Client-Date: Sun, 01 Jul 2012 21:07:21 GMT
      Client-Response-Num: 1
      X-Compression: gzip
      X-Powered-By: PHP/5.2.10-2ubuntu6.4

        Ahh its compressed! So how would I uncompress it to print to STDOUT?

        I just really need to read and save the file so it should work now but would like to know how to deal with on cmdline.

Re: problem web xml file
by Anonymous Monk on Jul 01, 2012 at 21:03 UTC

    Hi all, I'm having a small problem ...

    What problem?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://979319]
Front-paged by Arunbear
[Corion]: There is a weirdo shim because there is a POE integration for Prima, and if you use that, you can use the POE adapter of AnyEvent. What I'd want is something transport agnostic that parses HTTP or produces HTTP output, so that the communication with ...
[Corion]: ... the socket is done by my code. Ideally that module would not be based on callbacks ;)
[Corion]: Basically, something that decouples the HTTP parsing (+ determining whether to redirect, etc) from the IO
[Corion]: All clients I'm aware of don't do that but issue all IO themselves

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (12)
As of 2016-12-07 15:52 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (130 votes). Check out past polls.