Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Ask yourself "what is different" about the two requests, from your browser, and from your Perl code. There are two classes of common reasons for differences:

  1. Differences in the request.
  2. Differences in the processing of the response document.

For (1), remember that the request is much more than the URL: a number of headers may be sent by your browser. Headers that commonly change behaviour include Cookie, User-Agent, Referer, but any header should be looked at. You can look at the headers by sniffing the network (Wireshark), a browser plugin (e.g. Firebug for Firefox) or a proxy (Fiddler, on Windows). LWP (if that is what you are using) allows you to change the headers of your request.

For (2), usually this is Javascript. The commonly-used Perl tools, LWP and derivatives (e.g. WWW::Mechanize) do not support Javascript. In most cases you can read the Javascript yourself and manually mimic what it is doing by further requests or Perl code. But there do seem to be some Perl modules floating around that claim Javascript capabilities, usually through a conventional browser; have a look on CPAN. You could also look at Selenium.

Finally, think laterally--perhaps you can get your data another way. The website you mention seems to have various XML feeds.

In reply to Re: Parsing HTTP... by philipbailey
in thread Parsing HTTP... by insectopalo

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the monastery is silent...

    How do I use this? | Other CB clients
    Other Users?
    Others lurking in the Monastery: (8)
    As of 2018-04-26 19:54 GMT
    Find Nodes?
      Voting Booth?