Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Scrape From Webpage and extract real time stock prices

by tbone654 (Beadle)
on Dec 04, 2012 at 20:30 UTC ( #1007152=perlquestion: print w/ replies, xml ) Need Help??
tbone654 has asked for the wisdom of the Perl Monks concerning the following question:

I use LWP::Simple to go out and get a webpage and for the example's sake, write it to a file

perl -Mwarnings -MLWP::Simple -e 'BEGIN {getprint "http://www.stockta.com/cgi-bin/analysis.pl?symb=SPY&cobrand=&mode=stock/"}' >> zzz.out

To make this easier to read I'll parse out just the lines with the tags I need which are "borderTd" tags

cat zzz.out | perl -e 'while (<>) {next unless /borderTd/; print; $lines ++;} print "\n$lines"; '

So you end up with 53 lines which contain "borderTd"... I would print the lines here, but it's html which the brower wants to print as a web page, so the preview gets real ugly. The values I'm trying to capture are on lines 6, 7, 9 and 10 and contain:
line 6 = last price
line 7 = date
line 9 = open and high for the day
line 10 = low and volume.

My challenge is to extract the values in those tags, and manupulate them in some manner that is useful.

I have this other page in development with historical prices, and would like to somehow add these real-time prices to the page and do the updated calculations.

www.aztecura.com/cgi-bin/test15.pl

Input SPY for consistancy

Any help with splitting this data out somehow is very much appreciated.

Comment on Scrape From Webpage and extract real time stock prices
Select or Download Code
Re: Scrape From Webpage and extract real time stock prices
by ww (Bishop) on Dec 04, 2012 at 21:12 UTC
    1. "... I would print the lines here, but it's html which the brower wants to print as ..."
      So, consider the "html" to be code (which it is) and wrap it in <c>...html here </c> tags.*
      * ie, read the directions, in this case those surrounding the text entry box; Writeup Formatting Tips and or Markup in the Monastery
    2. Caveat: I have NOT attempted to read stockta.com's TOS </caveat>.   Have you? Beware practices that may violate the provider's Terms of Service.
       
    3. Does stockta.com offer an API? If so, use it in preference to home-rolled.

      I'm using it to learn how first... And I'm trying to keep it as generic as possible... Would probably use a module, but I don't own the server either, so I was thinking of how to use brute force before I roll it into the final product...

      I put it in <c> tags, but it doesn't format well...

        Maybe I misunderstood. Is your data source your site? If so, ignore my observations about TOS; if not, I repeat, "beware!"

        The balance of your first graf suggests that stockta.com is the IP of some other entity, but if that entity approves of you using their data, they likely have a published API. Ask. And if they don't approve? Well refer to para 1.

        How did using
        <c><table width="34%" border="1"> <tr> <td>aaa</td><td>bbb ccc ddd></td><td>2012-12-04</td><td>foo</td><tr> ...</table> -- ie, html here</c>
        fail? Note that the closing tag is -- as is customary (or, as here, required) </c> ... with a slash!

        Even raw html renders so long as it's of a reasonable width (and within the limits allowed by the fact that PM's html is restricted):

        a a aab bb2012-12-04foo
        12345 0987654321zyx20121204 16:49bar
Re: Scrape From Webpage and extract real time stock prices
by Anonymous Monk on Dec 05, 2012 at 09:05 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1007152]
Approved by bitingduck
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (10)
As of 2014-10-23 08:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (125 votes), past polls