Because microseconds add up: if you waste one microsecond on each record and process a million records, that is one whole second wasted right there, which is certainly "human-perceptible" in and of itself. As Ben Franklin famously wrote in Poor Richard's Almanac: (and I am probably misquoting from memory)
"Beware little expenses; a small leak will sink a great ship."
String/value type conversions can easily take enough time to cause significant problems, and the Tcl community has a significant problem with this because in Tcl, "Everything Is A String" and the interpreter does extensive tricks to make those semantics acceptably fast on modern hardware — if the code cooperates, otherwise you get what they call "shimmering" or repeated type conversions. There is an example on that page (in Tcl) with the benchmarks "left as an exercise for the reader". The same problem is less severe in Perl, because large structures do not normally have string representations in Perl and numbers that fit in a machine integer can be translated quickly, but you deceive yourself if you claim that it does not exist.
Boxing/unboxing strings between the PV slot of an SV and raw C char * strings should be (almost) free in perl, but wrapping a C string into an SV is not free if a new SV must be allocated. Memory allocators are complex, despite an extreme level of optimization, and take significant time in even the best cases: several microseconds for each call to allocate memory has been typical in my experiments, with worst cases of many milliseconds if the system suspends the task and swaps a page. Worse, in modern systems, malloc can return quickly, but the first access instead causes a page fault, producing outlier measurements that confound profiling.
None of this is new. Lisp programmers, working in one of the oldest languages still in use, have long had the simple advice "avoid consing in an inner loop" that expresses this exact problem.