http://www.perlmonks.org?node_id=994734

tbone654 has asked for the wisdom of the Perl Monks concerning the following question:

Think of this as the current state, then the pseudo-code for a new project to move it to a dynamic webpage...

CURRENT STATE

I currently download Yahoo finance historical data to a CSV and cut and paste into excel. I use excel functions to operate on the historical information to predict the next trading day's behavior, and backtest different setups. A worksheet in the workbook creates the html in column A, which I cut and paste into notepad++. Then I use filezilla to put the page to my yahoo served website. eg. http://www.aztecura.com/php/data/test.php

very manual, very timeconsuming, very outdated information almost immediately...

New solution inspired by: http://www.stockta.com/cgi-bin/analysis.pl?symb=AAPL&cobrand=&mode=stock

PSEUDO-CODE

From a webpage form, enter a Yahoo financial symbol and fetch the symbol.

Use perl modules "xxx" to scrape Yahoo historical data for fetched symbol.

Use perl modules "xxx" to perform analysis (similar to excel) functions on the scraped dataset and push additional values to the end of the array/hash.

Format the output and print below the original form used to fetch the symbol (as shown on the inspiration webpage).

The setup works for stocks, options, indexes, etc. alike... Some better than others... I would obviously like to automate this currently labor intensive activity and provide an output of the "top 10" into a dashboard or something... Also, I love excel, but I would like to be able to run this directly from the web, from my phone, tablet, someone else's computer, etc (lots of things that may not have excel). This is not intended to be a commercial product. I'm just planning on making it mobile and dynamic for my own use.

I am looking for guidance from anyone having an opinion on tips, tricks, warnings, better ideas, etc. I've written programs in "C" for testing craps, blackjack, dogs and trading strategies for many years where needed as a hobby. I do a lot of one-liner perl and awk for managing enterprise storage and performing automated data migrations. I will figure this out as I always do, but I'm trying not to re-invent everything from scratch as I often also tend to do. Note: Perl makes more sense than C for this project , mostly because yahoo doesn't like to serve compiled C. I can't seem to even run "hello world" from a yahoo site with gcc. I would also like to avoid setting up LAMP on my own box and getting a static IP from my service provider just to do this.

I've been playing with LWP::Simple and a few others to get some pages, but I think there must be better modules for the type of work described above.

Thanks in advance for any advice.

  • Comment on Scrape Yahoo Financial Historical- Process Dataset - format and create dynamic page

Replies are listed 'Best First'.
Re: Scrape Yahoo Financial Historical- Process Dataset - format and create dynamic page
by chromatic (Archbishop) on Sep 20, 2012 at 21:10 UTC

    Ha, Big Blue Marble has a project like this.

    The web front end is a really simple database-backed site that does none of this interesting processing. When someone requests information on a stock that's not already in the database, the site inserts a stub entry for that stock.

    Every few minutes, a cron job on the server looks for stub entries and runs the gamut of historical and daily analysis on those stubs. The live site updates what it displays for that stock as new information comes in. So far no one has objected to waiting up to five minutes for intrinsic value calculations. (We discussed running the cron job more frequently, but I haven't made that happen yet.)

    I ported a couple of formulas from Excel, but if you already understand things like a present value calculation, or could do very basic calculus on the back of an envelope (did that too!), you can reproduce those formulas easily in Perl.

    With that all said...

    I use excel functions to operate on the historical information to predict the next trading day's behavior...

    ... my best advice is to pray a lot. (I'm a value investor.)

      I used Finance::QuoteHist::Yahoo to pull the historical data I need to operate on the data. So I will set it up to be dynamically called by a form, and print the output right back to the same webpage.

      The problem right now is that yahoo doesn't support the ::Yahoo perl module I need to pull the data directly from the form. I can run it on my laptop and push the output to the yahoo server, but that's not sexy. I guess I have three options here:

    • Move it all to a web server somewhere where perl supports the module.
    • Build my own LAMP server and pay for a static IP, then I control the perl modules (I think?)
    • Use the standard modules which Yahoo supports, plus a few more, and write the excruciatingly painful scraper functions from what's available. (yuck)
    • Run some kind of simple http web server from my laptop that lets me map the perl script from my laptop to a public URL. eg. www.somesmallwebserver.com/3666698/tmp/perlscript.pl Which maps to my /tmp filesystem on my laptop.
    • I am sort of going down the path of the last option for now, just to get it going. I found one that allows downloading files only (brick something) so I was able to download a file from my laptop just fine. So I imagine there are some that just act as a webserver through a similar public ip address schema?

      Thank you for your reply. Good luck with that value investing thing, I'm an action junkie, so I need to move pretty fast and must have fairly live data.

        The problem right now is that yahoo doesn't support the ::Yahoo perl module

        You can, like, install the module

Re: Scrape Yahoo Financial Historical- Process Dataset - format and create dynamic page
by Anonymous Monk on Sep 20, 2012 at 20:01 UTC
      Finance::QuoteHist::Yahoo is perfect ... and very fast ... Thank you very much. I was dreading the thought of using LWP::Simple for all of this... Now I just find an Excel (like) formula module and I'll be dangerous... Thanks again.
      #!/usr/bin/perl use Finance::QuoteHist::Yahoo $q = new Finance::QuoteHist::Yahoo ( symbols => "SPY", start_date => '01/01/2012', end_date => 'today', ); # Values foreach $row ($q->quotes()) { ($symbol, $date, $open, $high, $low, $close, $volume) = @$row; print "$symbol, $date, $open, $high, $low, $close, $volume \n"; }

      SPY, 2012/09/07, 144.0100, 144.3900, 143.8800, 144.3300, 107272100
      SPY, 2012/09/10, 144.1900, 144.4400, 143.4600, 143.5100, 86458500
      SPY, 2012/09/11, 143.6000, 144.3700, 143.5600, 143.9100, 88760000
      SPY, 2012/09/12, 144.3900, 144.5500, 143.9000, 144.3900, 87640900
      SPY, 2012/09/13, 144.3700, 147.0400, 143.9900, 146.5900, 225470200
      SPY, 2012/09/14, 146.8800, 148.1100, 146.7600, 147.2400, 169777000
      SPY, 2012/09/17, 146.9400, 147.1900, 146.3700, 146.7400, 119427800
      SPY, 2012/09/18, 146.4900, 146.8100, 146.2500, 146.6200, 98326600
      SPY, 2012/09/19, 146.7900, 147.1700, 146.4100, 146.7000, 128318300
      SPY, 2012/09/20, 146.0300, 146.7900, 145.6300, 146.7100, 153955400