I have some experience with what you are referring to as an analog network. In fact I've only ever used analog output, so the digital output networks are the exception for me. Actually, there is little difference between the two other than the interpretation of the values that the network produces. The ability of the network to approximate your analog training sample is going to be dependant on the suitability of underlying network's multi-dimensional nonlinear polynomial (of sorts) to approximate the function. Since a sinusoidal function can be pretty well approximated (cose to the origin) by a low order Taylor series expansion, I would expect a suitably designed and trained NN to perform nearly as well. A look at the number of free paramaters in such a series expansion would give you a good hint at the size of network you would need (my guess is not very large).
From looking at your training data and the error over training iterations, I'd say that what you see doesn't look very odd. You can see that the error does indeed decrease from the first training iteration, it then goes to a low point and then levels out at a slightly higher value. This behavior is expected for a network with its number of weights and number of training samples roughly the same order of magnitude. It shows a tendancy for the network to become overtrained: for the training samples to become hardwired into the networks weights. To fix this you would either have to add many more training samples, or reduce the number of layers in your network. To me the network you've chosen looks too complex for the task at hand , therefore much more likely to become overtrained. Try a single hidden layer of 4-5 nodes.
Another way to avoid overtraining is to parition your sample data into two sets, train on one set and then after each epoch, test the error on the other data set. You should aim for a minimized error in the second data set.