Kickstart has asked for the wisdom of the Perl Monks concerning the following question:
Hi folks,
This question may be even lower level than Perl, but I could use some advice here.
I've been away from Perl for a bit (lack of time, nothing requiring it at work, etc.) but recently have been discussing with coworkers some topics involved in using Bayesian systems for prediction.
What I am considering is best given with an example. If we are given the year, month, month-day, weekday, location, and a few other miscellaneous bits of data and a price for a long enough period of time, is there a way for a computer to give a somewhat decent prediction of what the price will be tomorrow, given all the other factors for that day.
Is there a way to feed that data into a Bayesian-enabled Perl script? What would the Perl look like? What does any very basic Bayesian implementation look like in Perl? I've seen 190837, but that looks very specific to the spam problem.
Thanks,
KS
•Re: Bayesian not-for-spam
by merlyn (Sage) on Jul 14, 2003 at 22:47 UTC
|
| [reply] |
Re: Bayesian not-for-spam
by tilly (Archbishop) on Jul 14, 2003 at 23:48 UTC
|
You look like you might be trying to predict things like the stock market. If so, then please note that both theory and practice indicate that stock prices follow a random walk. There is no correlation between what they have done in the past and what they will do in the future. (Other than an overall tendancy to climb about 10% per year. OTOH that is a trend which we are currently well above the historical average for...)
Thus a computer program to predict the prices is unlikely to yield anything useful. Though our human tendancy to see patterns whether or not they are there will cause you to find all sorts of spurious things if you tinker enough..spurious connections that won't hold up with future data. | [reply] |
|
hmmm, tilly, whether you accept the random walk hypothesis depends on how orthodox your economics are.
Lo & MacKinlay, at MIT, in "A Non Random Walk down Wall Street" (1999) obtained overwhelming rejections of random walk; there is quite compelling evidence against it. i append a quote from Niederhoffer's biography you might find interesting.
This theory and the attitude of its adherents found classic expression in one incident I personally observed that deserves memorialization. A team of four of the most respected graduate students in finance had joined forces with two professors, now considered venerable enough to have won or to have been considered for a Nobel prize, but at that time feisty as Hades and insecure as a kid on his first date. This elite group was studying the possible impact of volume on stock price movements, a subject I had researched. As I was coming down the steps from the library on the third floor of Haskell Hall, the main business building, I could see this Group of Six gathered together on a stairway landing, examining some computer output. Their voices wafted up to me, echoing off the stone walls of the building. One of the students was pointing to some output while querying the professors, "Well, what if we really do find something? We'll be up the creek. It won't be consistent with the random walk model." The younger professor replied, "Don't worry, we'll cross that bridge in the unlikely event we come to it."
I could hardly believe my ears--here were six scientists openly hoping to find no departures from ignorance. I couldn't hold my tongue, and blurted out, "I sure am glad you are all keeping an open mind about your research." I could hardly refrain from grinning as I walked past them. I heard muttered imprecations in response.
respectfully,
...wufnik
-- in the world of the mules there are no rules --
| [reply] |
|
| [reply] |
Re: Bayesian not-for-spam
by chromatic (Archbishop) on Jul 14, 2003 at 21:50 UTC
|
What leads you to believe this is a question a Bayesian system can answer? As I understand it, a Bayesian system answers the question, "What is the probability this item is like either of these two opposite poles?" It's really a yes or no question. "Is this spam or ham?"
If I understand you (and Bayesian filters) correctly, this is not the question you want to answer.
| [reply] |
Re: Bayesian not-for-spam
by CountZero (Bishop) on Jul 15, 2003 at 06:00 UTC
|
I use a Bayesian-script for spam-protection and it works quite well, but the technology behind it does not seem particularly suited for predicting price changes. In my Encyclopedia Britannica I read Bayesian estimation: statistical technique for calculating the probability of the validity of a proposition on the basis of a prior estimate of its probability and new relevant evidence. Also note that Bayesian analysis is very sensitive to the distribution of the input. The best you can hope for is to code all relevant input parameters (and until now nobody has been able to identify all parameters which govern price-changes) and the resulting price-change (increase, stable, decrease), so as to validate the probability of a proposition that a new set of input data will lead to a lower, higher or stable price.Its predictive power will be minimal I fear, even with full knowledge of all relevant data. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
Re: Bayesian not-for-spam
by dbp (Pilgrim) on Jul 15, 2003 at 08:12 UTC
|
I doubt you could do what you want to do with a system like those used for spam detection. Such systems are typically naive bayes classifiers. They are naive because they assume all variables in the analysis are conditionally independent. For example, when comparing text a naive bayes system assumes that each word in the text is independent of every other word. This is obviously a completely bogus assumption but experimental results (and real world experience) shows us that textual classification doesn't suffer as a result of this assumption. This doesn't just apply in the case of spam, naive bayes classifiers have been trained to categorize posts into newsgroups and the like (see Mitchell, section 6.10). The problem is that your problem is much more difficult than the text classification problem and I'd expect this assumption to be much more damaging to your task.
There are other bayesian methods such as optimal bayes classifiers and bayesian networks. The Mitchell book noted above gives a nice overview. Note that all these methods are based on assigning probability to different hypotheses. You have a continous hypothesis space which makes this difficult. You could discretize your hypothers space by creating a finite discrete set of price ranges or by attempting to predict if the price will rise, fall, or stay the same. The other problem is that optimal classifiers and bayesian networks are costly to train. I'm not even sure there is a polynomial time algorithm for training bayesian networks. These techniques are closely related to maximum likelihood estimation, markov chain monte carlo, and other bayesian techniques commonly used in a variety of fields. This stuff is pretty hard-core and extemely processor-intensive. I work with a political scientist who does bayesian analysis of the supreme court and his simulations can run for weeks on an openmosix cluster of high-end machines. The simulations are written using a c++ library and I've spent a great deal of time optimizing it. This is a domain where an interpreted language like Perl simply doesn't shine. Honestly, we'd be better off in terms of speed in c (or better yet but god forbid, fortran) but we're trying to strike a balance between efficiency and ease of use in the library.
You are attempting to tackle a very hard problem. Assuming there are recognizable patterns in your data (the stock market is highly volatile but market prices of goods are a bit easier to predict) the patterns will likely be highly non-linear as a function of the imputs and your data will be incredibly noisy. Certain types of neural networks may fit your problem; they handle continuous inputs and outputs well and recurrent versions can deal with time series. Traditional econometric time-series techniques might work as well assuming your problem ends up being at least approximately linear. Essentially, what I'm saying is that choosing a learning/classification technique is going to be your biggest problem. You may have to try a few different techniques and tweak them extensively before you get anything resembling an accurate prediction. Implementation is downright trivial in comparison.
| [reply] |
Re: Bayesian not-for-spam
by hawtin (Prior) on Jul 15, 2003 at 08:02 UTC
|
Like others here I don't think you want Bayesian logic,
it sounds like what you are trying to do is similar to
predicting the weather based on today's readings.
I would suggest that what you want is probably fuzzy
logic. As it happens June's
Perl Journal has an
article on that topic (and its only $12 for a year,
subscribe now :-) ). It mentions
AI::FuzzyInference
as a good CPAN module to use.
I would also suggest looking at Peceptrons and neural nets.
There have been all sorts of books on complexity and
predicting dynamic systems. For example I found the
book "The Recursive Universe: Cosmic Complexity and the
Limits of Scientific Knowledge" by William Poundstone
really good
| [reply] |
Re: Bayesian not-for-spam
by wufnik (Friar) on Jul 15, 2003 at 11:08 UTC
|
howdy Kickstart
merlyns link is very useful for discrete data,
and a naive bayesian network. The bayesian approach is not limited to these, more advanced bayesian nets allowing causality
to be modelled in a statistically rigorous and useful way.
Neural Nets, mentioned above, not to mention decision trees, are alternatives here.
unfortunately,
there is no software for the advanced bayesian nets in perl.
if you are lucky enough to posess matlab, you will find BNT, by Murphy, which is GPL'd, very useful.
if the naive bayesian approach is sufficient, and it should be the start (and is by no means naive) you will find the question you face is how do you discretize your data?
day of the week etc is easy: the problems you will face will be in dealing with continuous variables like price. this discretization should not be linear if you are to make the most of the information that is there; you could also use an 'expert' to help you decide on the bands. this discretization is essential for typical bayesian net methods to work, so it is worth devoting attention to it.
once you have done this, just loop through your db and determine the conditional probabilities. feed these into your naive bayesian net, and robert is your uncle.
if it works, remember me in your will
...wufnik
in the world of the mules there are no rules
| [reply] |
Re: Bayesian not-for-spam
by hiseldl (Priest) on Jul 15, 2003 at 20:59 UTC
|
If you want a predictor, you should probably use a back-propogation or feed-forward neural net; take a look at Mark Jurik's site where there are some technical reports, etc. that may help (this is not an endorsement, just a suggestion).
-- hiseldl What time is it? It's Camel Time!
| [reply] |
Re: Bayesian not-for-spam (re: stock Market)
by AssFace (Pilgrim) on Oct 22, 2003 at 03:44 UTC
|
| [reply] |
|
|