From looking at your training data and the error over training iterations, I'd say that what you see doesn't look very odd. You can see that the error does indeed decrease from the first training iteration, it then goes to a low point and then levels out at a slightly higher value. This behavior is expected for a network with its number of weights and number of training samples roughly the same order of magnitude. It shows a tendancy for the network to become overtrained: for the training samples to become hardwired into the networks weights. To fix this you would either have to add many more training samples, or reduce the number of layers in your network. To me the network you've chosen looks too complex for the task at hand , therefore much more likely to become overtrained. Try a single hidden layer of 4-5 nodes.
Another way to avoid overtraining is to parition your sample data into two sets, train on one set and then after each epoch, test the error on the other data set. You should aim for a minimized error in the second data set.