|The stupid question is the question not asked|
There is a prevalence hereabouts to say, "Don't optimise!". Throw bigger hardware at it. Use a database. Use a binary library. Code in C.
I don't dispute that some people say that. But what the most experienced people say is, "Don't prematurely optimize!". The key word being "prematurely".
Usual estimates are that most programs spend about 80% of their time in 20% of the code, and about 50% of the time in just 4% or so. This suggests that improving a very small fraction of the program after the fact can yield dramatic performance improvements. Those estimates actually date back to studies of FORTRAN programs decades ago. I've never seen anything to suggest that they aren't decent guidelines Perl programs today. (And they are similar to, for instance, what time and motion studies found about speeding up factory and office work before computers.)
With that relationship, you want to leave optimization for later. First of all because in practice later never needs to come. Secondly if it does come then, as you demonstrated, there often are big wins available for relatively little work. However the wins are not going to always be in easily predicted places. Therefore the advice to leave optimizing until you both know how much optimizing you need to do and can determine where it needs to be optimized.
You didn't need profiling tools in this case because the program was small and simple. In a larger one, though, you need to figure out what good places to look at are. After all if 50% of your time is really spent in 4% of your code, there is no point in starting off by putting a lot of energy into a section that you don't know affects that 4%. (Note that when you find your hotspots, sometimes you'll want to look at the hotspot, and sometimes you'll want to look at whether you need to call it where it is being called. People often miss the second.)
Also improvements are often platform specific. For instance you got huge improvements by using sysread rather than read. However in past discussions people have found that which one wins depends on what operating system you are using. So for someone else, that optimization may be a pessimization instead. Someone who reads your post and begins a habit of always using sysread has taken away the wrong lesson.
Now some corrections. You made frequent references in your post to theories that you have about when GC was likely to run. These theories are wrong because Perl is not a garbage collected language.
Also your slams against databases are unfair to databases. First of all the reasons to use databases often have little or nothing to do with performance. Instead it has to do with things like managing concurrent access and consistency of data from multiple applications on multiple machines. If that is your need, you probably want a database even if it costs huge performance overhead. Rolling your own you'll probably make mistakes. Even if you don't, by the time you've managed to get all of the details sorted out, you'll have recreated a database, only not as well.
But even on the performance front you're unfair. Sure, databases would not help with this problem. It is also true that most of the time where databases are a performance win, they win because they hand the application only the greatly reduced subset of the raw data that it really needs. But databases are often a performance win when they don't reduce how much data you need to fetch because they move processing into the databases query engine, which tends to optimize better than most programmers know how to. (Indeed an amusing recurring performance problem with good databases is that programmers think that an index is "obviously better" and go out of their way to force the database to use one when they are better off with a full table scan.)