Several years ago I was writing ETL for a very large datawarehouse (think top 10 databases in the world, although this was over a decade ago). And I did this on Sun's E10k frames which were multiple CPU (this pre-dates multi-core).
Because there are different ways of being 'bound', Memory-Bound, CPU-Bound, Network-Bound, Disk I/O-Bound, etc... the only way to truly know the optimal number of threads is by trial and error. At least with a program of any complexity. So you run tests and trials and you plot it out and it should be pretty clear where the sweet spot is.
--
“For the Present is the point at which time touches eternity.” - CS Lewis