Scrape From Webpage and extract real time stock prices

by tbone654 (Beadle)
on Dec 04, 2012 at 20:30 UTC
tbone654 has asked for the wisdom of the Perl Monks concerning the following question:

I use LWP::Simple to go out and get a webpage and for the example's sake, write it to a file

perl -Mwarnings -MLWP::Simple -e 'BEGIN {getprint ""}' >> zzz.out

To make this easier to read I'll parse out just the lines with the tags I need which are "borderTd" tags

cat zzz.out | perl -e 'while (<>) {next unless /borderTd/; print; $lines ++;} print "\n$lines"; '

So you end up with 53 lines which contain "borderTd"... I would print the lines here, but it's html which the brower wants to print as a web page, so the preview gets real ugly. The values I'm trying to capture are on lines 6, 7, 9 and 10 and contain:
line 6 = last price
line 7 = date
line 9 = open and high for the day
line 10 = low and volume.

My challenge is to extract the values in those tags, and manupulate them in some manner that is useful.

I have this other page in development with historical prices, and would like to somehow add these real-time prices to the page and do the updated calculations.

Input SPY for consistancy

Any help with splitting this data out somehow is very much appreciated.

Re: Scrape From Webpage and extract real time stock prices
by ww (Archbishop) on Dec 04, 2012 at 21:12 UTC
    1. "... I would print the lines here, but it's html which the brower wants to print as ..."
      So, consider the "html" to be code (which it is) and wrap it in <c>...html here </c> tags.*
      * ie, read the directions, in this case those surrounding the text entry box; Writeup Formatting Tips and or Markup in the Monastery
    2. Caveat: I have NOT attempted to read's TOS </caveat>.   Have you? Beware practices that may violate the provider's Terms of Service.
    3. Does offer an API? If so, use it in preference to home-rolled.

      I'm using it to learn how first... And I'm trying to keep it as generic as possible... Would probably use a module, but I don't own the server either, so I was thinking of how to use brute force before I roll it into the final product...

      I put it in <c> tags, but it doesn't format well...

        Maybe I misunderstood. Is your data source your site? If so, ignore my observations about TOS; if not, I repeat, "beware!"

        The balance of your first graf suggests that is the IP of some other entity, but if that entity approves of you using their data, they likely have a published API. Ask. And if they don't approve? Well refer to para 1.

        How did using
        <c><table width="34%" border="1"> <tr> <td>aaa</td><td>bbb ccc ddd></td><td>2012-12-04</td><td>foo</td><tr> ...</table> -- ie, html here</c>
        fail? Note that the closing tag is -- as is customary (or, as here, required) </c> ... with a slash!

        Even raw html renders so long as it's of a reasonable width (and within the limits allowed by the fact that PM's html is restricted):

        a a aab bb2012-12-04foo
        12345 0987654321zyx20121204 16:49bar
Re: Scrape From Webpage and extract real time stock prices
by Anonymous Monk on Dec 05, 2012 at 09:05 UTC

Node Type: perlquestion
