http://www.perlmonks.org?node_id=441158


in reply to AI::NNFlex::Backprop error not decreasing

Hi,

Your problem is probably partly my fault. The 'sigmoid' activation function uses a formula that I haven't worked out how to integrate yet, so there is no corresponding sigmoid_slope function to return the slope of the error curve. I should really have taken that activation function out - apologies for the oversight.

I would suggest you use the tanh activation function instead. I'll correct the module & documentation for the next release.

Could some kind monk tell me the 1st order derivative of this function:

(1+exp(-$value))**-1

so I can correct the code?

You've also got several layers defined. While there is no theoretical reason why you shouldn't (and I wrote the code with that in mind) it is more usual to use 3 layers, and adjust the number of nodes in the hidden layer to reflect the number of values you need the network to learn.

Update: Oops, didn't spot the question at the bottom. Theoretically there is no reason why you shouldn't have analogue values learned by the network although again it's unusual, and you'll lose a bit of precision on the output. I must admit I've never tried implementing an analogue net with AI::NNFlex::Backprop though, so I can't guarantee it will work.

While analogue nets are possible, it's an unusual approach, and takes a good deal of thinking about. Backprop nets are what my tutor likes to call a 'universal approximator'. Given the precision and size of your data set, my feeling is that trying to teach a backprop net this kind of data in this form is likely to fail - the output values will always be too approximate, so the error slope will never have a true 'solution'.

The fact that the module didn't fail when unable to find a slope function suggests that you are using 0.2. This bug is fixed in 0.21, which is a lot faster as well, so you might want to get that version from CPAN.

g0n, backpropagated monk
  • Comment on Re: AI::NNFlex::Backprop error not decreasing

Replies are listed 'Best First'.
Re^2: AI::NNFlex::Backprop error not decreasing
by polettix (Vicar) on Mar 21, 2005 at 11:46 UTC
    If my memory hasn't too much rust, the derivative you're looking for is:

    exp(-$value) * ((1 + exp(-$value)) ** -2)

    Flavio.

    -- Don't fool yourself.
      Thanks frodo72, that seems to do the job. I'll put that in the code for the next release, although I'd still recommend the OP uses tanh, as it seems to be more effective with this implementation of backprop.

      g0n, backpropagated monk
      Just a slight correction to frodo72's derivative:
      -1 * exp(-$value) * ((1 + exp(-$value)) ** -2)
      Update: nevermind, forgot a negative that canceled. =\

      -caedes

Re^2: AI::NNFlex::Backprop error not decreasing
by caedes (Pilgrim) on Mar 21, 2005 at 21:31 UTC
    I have some experience with what you are referring to as an analog network. In fact I've only ever used analog output, so the digital output networks are the exception for me. Actually, there is little difference between the two other than the interpretation of the values that the network produces. The ability of the network to approximate your analog training sample is going to be dependant on the suitability of underlying network's multi-dimensional nonlinear polynomial (of sorts) to approximate the function. Since a sinusoidal function can be pretty well approximated (cose to the origin) by a low order Taylor series expansion, I would expect a suitably designed and trained NN to perform nearly as well. A look at the number of free paramaters in such a series expansion would give you a good hint at the size of network you would need (my guess is not very large).

    From looking at your training data and the error over training iterations, I'd say that what you see doesn't look very odd. You can see that the error does indeed decrease from the first training iteration, it then goes to a low point and then levels out at a slightly higher value. This behavior is expected for a network with its number of weights and number of training samples roughly the same order of magnitude. It shows a tendancy for the network to become overtrained: for the training samples to become hardwired into the networks weights. To fix this you would either have to add many more training samples, or reduce the number of layers in your network. To me the network you've chosen looks too complex for the task at hand , therefore much more likely to become overtrained. Try a single hidden layer of 4-5 nodes.

    Another way to avoid overtraining is to parition your sample data into two sets, train on one set and then after each epoch, test the error on the other data set. You should aim for a minimized error in the second data set.

    -caedes

      I have taken your recommendations into consideration. Yet now I am having problem with handling the output of the test set of examples on the network. It appears that the network returns an array of values, but all the values are the same.

      use strict; use AI::NNFlex::Backprop; use AI::NNFlex::Dataset; use Data::Dumper; my $n = 0.4; my $num_epochs = 100; my $network = AI::NNFlex::Backprop->new(learningrate=>.9, bias=>1, ); $network->add_layer(nodes=>3,activationfunction=>'tanh'); #$network->add_layer(nodes=>3,activationfunction=>'tanh'); #$network->add_layer(nodes=>2,activationfunction=>'tanh'); #$network->add_layer(nodes=>3,activationfunction=>'tanh'); $network->add_layer(nodes=>5,activationfunction=>'tanh'); $network->add_layer(nodes=>2,activationfunction=>'sigmoid'); $network->init(); my $test_set = AI::NNFlex::Dataset->new([ [6.28318,1.570795,0], [1,0], [6.28318,1.570795,1.570795], [0,-1], [6.28318,1.570795,3.14159], [-1,0], [6.28318,1.570795,4.712385], [0,1], [6.28318,1.570795,6.28318], [1,0], [6.28318,1.570795,7.853975], [0,-1], [6.28318,3.14159,0], [0,-1], [6.28318,3.14159,1.570795], [-1,0], [6.28318,3.14159,3.14159], [0,1], [6.28318,3.14159,4.712385], [1,0], [6.28318,3.14159,6.28318], [0,-1], [6.28318,3.14159,7.853975], [-1,0], [6.28318,4.712385,0], [-1,0], [6.28318,4.712385,1.570795], [0,1], [6.28318,4.712385,3.14159], [1,0], [6.28318,4.712385,4.712385], [0,-1], [6.28318,4.712385,6.28318], [-1,0], [6.28318,4.712385,7.853975], [0,1], [6.28318,6.28318,0], [0,1], [6.28318,6.28318,1.570795], [1,0], [6.28318,6.28318,3.14159], [0,-1], [6.28318,6.28318,4.712385], [-1,0], [6.28318,6.28318,6.28318], [0,1], [6.28318,6.28318,7.853975], [1,0], [6.28318,7.853975,0], [1,0], [6.28318,7.853975,1.570795], [0,-1], [6.28318,7.853975,3.14159], [-1,0], [6.28318,7.853975,4.712385], [0,1], [6.28318,7.853975,6.28318], [1,0], [6.28318,7.853975,7.853975], [0,-1], [7.853975,0,0], [1,0], [7.853975,0,1.570795], [0,-1], [7.853975,0,3.14159], [-1,0], [7.853975,0,4.712385], [0,1], [7.853975,0,6.28318], [1,0], [7.853975,0,7.853975], [0,-1], [7.853975,1.570795,0], [0,-1], [7.853975,1.570795,1.570795], [-1,0], [7.853975,1.570795,3.14159], [0,1], [7.853975,1.570795,4.712385], [1,0], [7.853975,1.570795,6.28318], [0,-1], [7.853975,1.570795,7.853975], [-1,0], [7.853975,3.14159,0], [-1,0], [7.853975,3.14159,1.570795], [0,1], [7.853975,3.14159,3.14159], [1,0], [7.853975,3.14159,4.712385], [0,-1], [7.853975,3.14159,6.28318], [-1,0], [7.853975,3.14159,7.853975], [0,1], [7.853975,4.712385,0], [0,1], [7.853975,4.712385,1.570795], [1,0], [7.853975,4.712385,3.14159], [0,-1], [7.853975,4.712385,4.712385], [-1,0], [7.853975,4.712385,6.28318], [0,1], [7.853975,4.712385,7.853975], [1,0], [7.853975,6.28318,0], [1,0], [7.853975,6.28318,1.570795], [0,-1], [7.853975,6.28318,3.14159], [-1,0], [7.853975,6.28318,4.712385], [0,1], [7.853975,6.28318,6.28318], [1,0], [7.853975,6.28318,7.853975], [0,-1], [7.853975,7.853975,0], [0,-1], [7.853975,7.853975,1.570795], [-1,0], [7.853975,7.853975,3.14159], [0,1], [7.853975,7.853975,4.712385], [1,0], [7.853975,7.853975,6.28318], [0,-1], [7.853975,7.853975,7.853975], [-1,0] ]); my $train_set = AI::NNFlex::Dataset->new([ [0,0,0], [0,1], [0,0,1.570795], [1,0], [0,0,3.14159], [0,-1], [0,0,4.712385], [-1,0], [0,0,6.28318], [0,1], [0,0,7.853975], [1,0], [0,1.570795,0], [1,0], [0,1.570795,1.570795], [0,-1], [0,1.570795,3.14159], [-1,0], [0,1.570795,4.712385], [0,1], [0,1.570795,6.28318], [1,0], [0,1.570795,7.853975], [0,-1], [0,3.14159,0], [0,-1], [0,3.14159,1.570795], [-1,0], [0,3.14159,3.14159], [0,1], [0,3.14159,4.712385], [1,0], [0,3.14159,6.28318], [0,-1], [0,3.14159,7.853975], [-1,0], [0,4.712385,0], [-1,0], [0,4.712385,1.570795], [0,1], [0,4.712385,3.14159], [1,0], [0,4.712385,4.712385], [0,-1], [0,4.712385,6.28318], [-1,0], [0,4.712385,7.853975], [0,1], [0,6.28318,0], [0,1], [0,6.28318,1.570795], [1,0], [0,6.28318,3.14159], [0,-1], [0,6.28318,4.712385], [-1,0], [0,6.28318,6.28318], [0,1], [0,6.28318,7.853975], [1,0], [0,7.853975,0], [1,0], [0,7.853975,1.570795], [0,-1], [0,7.853975,3.14159], [-1,0], [0,7.853975,4.712385], [0,1], [0,7.853975,6.28318], [1,0], [0,7.853975,7.853975], [0,-1], [1.570795,0,0], [1,0], [1.570795,0,1.570795], [0,-1], [1.570795,0,3.14159], [-1,0], [1.570795,0,4.712385], [0,1], [1.570795,0,6.28318], [1,0], [1.570795,0,7.853975], [0,-1], [1.570795,1.570795,0], [0,-1], [1.570795,1.570795,1.570795], [-1,0], [1.570795,1.570795,3.14159], [0,1], [1.570795,1.570795,4.712385], [1,0], [1.570795,1.570795,6.28318], [0,-1], [1.570795,1.570795,7.853975], [-1,0], [1.570795,3.14159,0], [-1,0], [1.570795,3.14159,1.570795], [0,1], [1.570795,3.14159,3.14159], [1,0], [1.570795,3.14159,4.712385], [0,-1], [1.570795,3.14159,6.28318], [-1,0], [1.570795,3.14159,7.853975], [0,1], [1.570795,4.712385,0], [0,1], [1.570795,4.712385,1.570795], [1,0], [1.570795,4.712385,3.14159], [0,-1], [1.570795,4.712385,4.712385], [-1,0], [1.570795,4.712385,6.28318], [0,1], [1.570795,4.712385,7.853975], [1,0], [1.570795,6.28318,0], [1,0], [1.570795,6.28318,1.570795], [0,-1], [1.570795,6.28318,3.14159], [-1,0], [1.570795,6.28318,4.712385], [0,1], [1.570795,6.28318,6.28318], [1,0], [1.570795,6.28318,7.853975], [0,-1], [1.570795,7.853975,0], [0,-1], [1.570795,7.853975,1.570795], [-1,0], [1.570795,7.853975,3.14159], [0,1], [1.570795,7.853975,4.712385], [1,0], [1.570795,7.853975,6.28318], [0,-1], [1.570795,7.853975,7.853975], [-1,0], [3.14159,0,0], [0,-1], [3.14159,0,1.570795], [-1,0], [3.14159,0,3.14159], [0,1], [3.14159,0,4.712385], [1,0], [3.14159,0,6.28318], [0,-1], [3.14159,0,7.853975], [-1,0], [3.14159,1.570795,0], [-1,0], [3.14159,1.570795,1.570795], [0,1], [3.14159,1.570795,3.14159], [1,0], [3.14159,1.570795,4.712385], [0,-1], [3.14159,1.570795,6.28318], [-1,0], [3.14159,1.570795,7.853975], [0,1], [3.14159,3.14159,0], [0,1], [3.14159,3.14159,1.570795], [1,0], [3.14159,3.14159,3.14159], [0,-1], [3.14159,3.14159,4.712385], [-1,0], [3.14159,3.14159,6.28318], [0,1], [3.14159,3.14159,7.853975], [1,0], [3.14159,4.712385,0], [1,0], [3.14159,4.712385,1.570795], [0,-1], [3.14159,4.712385,3.14159], [-1,0], [3.14159,4.712385,4.712385], [0,1], [3.14159,4.712385,6.28318], [1,0], [3.14159,4.712385,7.853975], [0,-1], [3.14159,6.28318,0], [0,-1], [3.14159,6.28318,1.570795], [-1,0], [3.14159,6.28318,3.14159], [0,1], [3.14159,6.28318,4.712385], [1,0], [3.14159,6.28318,6.28318], [0,-1], [3.14159,6.28318,7.853975], [-1,0], [3.14159,7.853975,0], [-1,0], [3.14159,7.853975,1.570795], [0,1], [3.14159,7.853975,3.14159], [1,0], [3.14159,7.853975,4.712385], [0,-1], [3.14159,7.853975,6.28318], [-1,0], [3.14159,7.853975,7.853975], [0,1], [4.712385,0,0], [-1,0], [4.712385,0,1.570795], [0,1], [4.712385,0,3.14159], [1,0], [4.712385,0,4.712385], [0,-1], [4.712385,0,6.28318], [-1,0], [4.712385,0,7.853975], [0,1], [4.712385,1.570795,0], [0,1], [4.712385,1.570795,1.570795], [1,0], [4.712385,1.570795,3.14159], [0,-1], [4.712385,1.570795,4.712385], [-1,0], [4.712385,1.570795,6.28318], [0,1], [4.712385,1.570795,7.853975], [1,0], [4.712385,3.14159,0], [1,0], [4.712385,3.14159,1.570795], [0,-1], [4.712385,3.14159,3.14159], [-1,0], [4.712385,3.14159,4.712385], [0,1], [4.712385,3.14159,6.28318], [1,0], [4.712385,3.14159,7.853975], [0,-1], [4.712385,4.712385,0], [0,-1], [4.712385,4.712385,1.570795], [-1,0], [4.712385,4.712385,3.14159], [0,1], [4.712385,4.712385,4.712385], [1,0], [4.712385,4.712385,6.28318], [0,-1], [4.712385,4.712385,7.853975], [-1,0], [4.712385,6.28318,0], [-1,0], [4.712385,6.28318,1.570795], [0,1], [4.712385,6.28318,3.14159], [1,0], [4.712385,6.28318,4.712385], [0,-1], [4.712385,6.28318,6.28318], [-1,0], [4.712385,6.28318,7.853975], [0,1], [4.712385,7.853975,0], [0,1], [4.712385,7.853975,1.570795], [1,0], [4.712385,7.853975,3.14159], [0,-1], [4.712385,7.853975,4.712385], [-1,0], [4.712385,7.853975,6.28318], [0,1], [4.712385,7.853975,7.853975], [1,0], [6.28318,0,0], [0,1], [6.28318,0,1.570795], [1,0], [6.28318,0,3.14159], [0,-1], [6.28318,0,4.712385], [-1,0], [6.28318,0,6.28318], [0,1], [6.28318,0,7.853975], [1,0] ]); my $epoch = 1; my $err = 1; while($err > .001 && $epoch < 100) { $err = $train_set->learn($network); my $outputsRef = $test_set->run($network); print Dumper($outputsRef); print "Error: $err\n"; $epoch++; }

      The output of the network with the test set gives the following.

      $ perl test1.pl $VAR1 = [ [ '2.22776546277668e-07', '0.011408329955622' ], [ '2.22776546277668e-07', '0.011408329955622' ], [ '2.22776546277668e-07', '0.011408329955622' ], [ '2.22776546277668e-07', '0.011408329955622' ], [ '2.22776546277668e-07', '0.011408329955622' ], [ '2.22776546277668e-07', '0.011408329955622' ], .... ....

      Am I handling the output of the network in correctly? The module says that Runs the dataset through the network and returns a reference to an array of output patterns. I guess I am not handling the reference array correctly.

      Thanks for all the help.

        First of all, thanks to caedes for giving such a detailed answer to the question. I haven't done much with analog target output, and am no sort of mathematician, so that was really helpful to me as well as hopefully to the OP.

        You are definitely handling the outputs correctly, I suspect the reason that you are getting the same values for every pattern during training is mathematical rather than programmatic. Setting the network to debug=>[4] shows that the network is adjusting weights successfully, but seems unable to learn the data as it stands. If you look at the output from the beginning, during the first few epochs it changes, then settles on a consistent value. Combined with the fact that the RMS error is very high, that suggests that an identical response of [x,y] for every pattern is the best solution the network has been able to come up with, as the network is currently set up.

        Correctly set up, backprop will find a solution (not necessarily the optimum) provided one exists. It looks to me like a backprop solution for your dataset as it stands doesn't exist.

        BUT, I don't want to mislead you. As I said above, I'm no kind of mathematician (psychologist really - I wrote this module for cognitive modelling), and my understanding of backprop is pretty much empirical. Some more mathematical monk may be able to give you (and for that matter me) better guidance on this.

        Update: FWIW I've taken another look at this. I switched on debug (debug=>[5]) to get the return values from the activatation functions, and since the inputs are mostly greater than 1, tanh is returning 1 or close to it for all of them.

        g0n, backpropagated monk
Re^2: AI::NNFlex::Backprop error not decreasing (Thanks)
by thealienz1 (Pilgrim) on Mar 23, 2005 at 19:58 UTC

    I appreciate all your feedback for this programming project I have been working on. But, I just wanted to let you know that in then end I just simplifed the solution I was looking for. Basically, created two neural networks for each output I was looking for. It seems to me that a single network of complexity to recognize to functions such that would be larger than I needed/wanted. The calculations for weight, epochs, etc... increased so much when I made the networks bigger. Actually, creating separate networks and then training them cut down on time.

    I came to this conclusion when I finally gave in and starting using Matlab for the Neural Network Toolbox that my school has. When I could view the output and easily graph it. The network needed would be too huge for my needs.

    I actually, when I have the time, would like to print out your code and really try to understand your implementation. This is something I have recently become interested in since I started taking a class on the subject. Its something I would to learn further and perhaps do my master's project (not thesis) on.

    Thank you again for your help and the excellent code you have written.

    Regards,
    JT Archie