Well, if I were you, I would stop thinking in the good old GNU gprof-style profiling model, and I would start to look into modern and less tedious profiling frameworks. Unfortunately, with gprof you do not have really good chances of getting anything useful on a modern CPU unless you run your code for minutes, because it samples your code only every 10 millisconds or so, which is a bit too rare nowadays.
in reply to profiling an XS module
In particular, I am thinking about you could try Valgrind or OProfile (if you are lucky enough to be using Linux). Both allow you to profile code without specially compiling it for profiling, and they also provide much better granularity than gprof. With the latter I don't have too much experience, but with the former (and, especially, callgrind) I have already had the chance to make myself acquainted to my greatest programming and profiling pleasure.