Despite resembling an infamous user who said he would not come back, the anonymonk does have a point. While it is sometimes possible to find bottlenecks by a thorough understanding of the system, you are far more likely to get useful information from actual measurements — and those useful results may just be surprising even to an experienced programmer.
In short, do not assume that perl's box/unbox routines (which should be very lightweight if you are reusing the container SVs) are the source of your performance problems — use profiling (at both Perl and C levels, so you can see the time spent in XS code) and then consider how to improve the running time of your program.
Profiling is important. If you optimize one block of code to run in no time at all, but the program was only spending 1% of its time in that code, you have gained only 1%, but if you improve an algorithm to cut the running time of another block in half, but the program spent 70% of its time in that block, you have gained about 35%.