I have had good luck with a record/playback model. That is:
- log the traffic (at either a semantic level or a network level, depending on the particulars of the application), if possible, from a real scenario from your "control" code base (that is, either set up a monitor between the client and the server, or alter the client or server to embed a logging capability).
- make a mock client which can replay the same traffic to the server, and track the output from the server, also vice-versa with a mock server (or only use a mock client if you only want to test the server, or vice-versa).
- use the "control" logs obtained from the logger/monitor, and the mock client/server to play the same traffic back against the test codebase, and, likewise monitor it.
- diff the control logs versus the test logs.
A simple example would be to test a web server by taking your real web logs (possibly altered to include more information than would normally be included, like, for example, the POST string), and running them through a script which made an LWP request for each entry in the web log. Then, of course, diffing the HTML.
Of course, the method is generally applicable to anything that allows you to spoof the client or the server (and I've used it for more than just web servers), but web servers make a nice example. A more general method would be simply to trap all the traffic on the socket and write it to a file (tracking what is server talking and what is client talking), and then play the one sides half of that conversation into the other side by, well, basically pasting it into a telnet window :-)
This model is very helpful for situations where you want to make sure that no unexpected changes occured between your control and your test version. Obviously, you'll want to identify differences and decide whether or not they are expected differences. It's often useful to munge the diff (systematically remove approved differences), and/or to munge the input's to the diff (likewise, as a way to systematically remove differences... but actually by making the inputs more similar, rather than by clipping the output of the diff, which can often be more difficult).
Not an editor command: Wq