Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Ask yourself "what is different" about the two requests, from your browser, and from your Perl code. There are two classes of common reasons for differences:

  1. Differences in the request.
  2. Differences in the processing of the response document.

For (1), remember that the request is much more than the URL: a number of headers may be sent by your browser. Headers that commonly change behaviour include Cookie, User-Agent, Referer, but any header should be looked at. You can look at the headers by sniffing the network (Wireshark), a browser plugin (e.g. Firebug for Firefox) or a proxy (Fiddler, on Windows). LWP (if that is what you are using) allows you to change the headers of your request.

For (2), usually this is Javascript. The commonly-used Perl tools, LWP and derivatives (e.g. WWW::Mechanize) do not support Javascript. In most cases you can read the Javascript yourself and manually mimic what it is doing by further requests or Perl code. But there do seem to be some Perl modules floating around that claim Javascript capabilities, usually through a conventional browser; have a look on CPAN. You could also look at Selenium.

Finally, think laterally--perhaps you can get your data another way. The website you mention seems to have various XML feeds.


In reply to Re: Parsing HTTP... by philipbailey
in thread Parsing HTTP... by insectopalo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others perusing the Monastery: (5)
    As of 2014-12-28 11:00 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (180 votes), past polls