It may not be obvious if you are on Win32 but you are already using the built-in profiling per subroutine, correct? That is, if you open a command line shell in linux and run the scripts/xx_server.pl http daemon it will for each request tell you precisely how much time each routine/phase takes. At least it does when I run from a linux shell, with Debug on. I didn't catch exactly what is segfaulting for you but you might try to run the http daemon from a cygwin shell for instance. I think the same output (with Debug on) is shown in the apache error log, at least it is with my FastCGI server.
Also here is a post about using Time::HiRes with Debug in Catalyst for some simple profiling, it lets you start and stop a timer. This module (which ought to be added to CPAN perhaps) might do what you want.
I recommend joining the Catalyst ML.