I have not tried this out myself, but from the documentation of Devel-NYTProf it seems like the
calls=N option might be worth to try.
If your loop body is more or less a subroutine call, the following might be useful.
From the documentation (emphasis done by me):
calls=N
This option is new and experimental.
With calls=1 (the default) subroutine call return events are emitted into the data stream as they happen. With calls=2 subroutine call entry events are also emitted. With calls=0 no subroutine call events are produced. This option depends on the subs option being enabled, which it is by default.
The nytprofcalls utility can be used to process this data. It too is new and experimental and so likely to change.
The subroutine profiler normally gathers data in memory and outputs a summary when the profile data is being finalized, usually when the program has finished. The summary contains aggregate information for all the calls from one location to another, but the details of individual calls have been lost. The calls option enables the recording of individual call events and thus more detailed analysis and reporting of that data.