Good article -- I've never done the packet trace, but the difference doesn't surprise me. It's not usually necessary to do everything in memory in order to calculate Content-Length. That's only needed for browsers that don't understand chunked encoding. Chunked encoding also allows headers to be listed at the end -- very similar to PostScript's "at end" headers.
mod_perl can transparently generate chunked encoding on recent versions of Apache. Doing chunked encoding from plain CGI would be more difficult, but the protocol is fairly simple.
For more info, see: