I would use flite to generate the audio. It is fast and reasonably clear. If you wanted true randomness, you could generate and output directly to the cgi (the way graphic files are done... just slap on an audio header and print binary).
in reply to howto: Perl CGI, image with random scewed text for account creations
But you may run into problems, like not being able to compile flite on your webserver, or it draining too much resource on a heavy activity site. In that case, you could pre-record a series of random .au files ( with flite or someone with a clear voice) and upload them daily. Even a hundred random image-audio pairs would probably be sufficient to defeat automated scripts.
Whatever you do, make sure your audio is going to be in a format(bitrate and sampling frequency) that all systems will be able to play. Quite often, especially in cgi, people will go for the smallest audio file size, like 8-bit, 8khz audio, which will not play everywhere. Better to stick with a high quality standard, that almost all systems support, like cd quality at
16 bit, 44.1Hkz.