With a very rudimentary parallelization algorithm in the calling script, the runtime fell from about 1h50m to about 1h15m. This was just moving from serial execution to 2 parallel jobs... with some inefficiencies due to very simple shell backgrounding and "wait" semantics.
That smacks of very poor parallelisation. The "usual suspect" in this scenario is splitting the workload -- in this case the list of files -- in "half" (numerically) and running an independent process on each "half". The problem of course is that equal numbers of files are usually very unequal in terms of the runtime requirement because of differences in the size and complexity of the individual files. Sod's law dictates that all the big complex files will end up in one half, and leaving all the short, simple ones in the other. Hence after the first few minutes, the latter runs out of files to process and you're back to a single linear workflow.
Far better, and actually trivial to implement, is to remove the for $f (@f) { file processing loop from each of the findxxxx() routines and then feed the file list into a shared queue. You then run the findxxxx() sub in two or more threads that read the names of the next file to process from the queue and do their thing.
The process then becomes self-balancing. Allowing one thread to process many small files concurrently with another that processes a few large ones. Or whatever combination of the two that happens to result in the way they come off the queue.
This requires a little restructuring of the existing script, but it would definitely be better for it. Bringing the script into the modern world by adding strictness and warnings and a level of modularisation wouldn't go amiss either.
If I parallelize this daily job as well (2 processes), I run the risk of simultaneously maxxing out all 4 cores in this particular system... which would (possibly) be bad, because it also runs the web app that these scripts populate the data for. So I kinda need to be careful how I do this.
I've encountered this reasoning for avoiding parallelisation before. And whilst it seems like a good idea to reserve some cpu overhead for "important stuff", it doesn't hold water under closer inspection. You cannot "bank" CPU cycles and utilise them later. Unused CPU time is pure cost.
Like underutilised staff in an empty shop, you are still paying the wages and to keep the lights on. Even on the very latest CPUs with complex power management, when the CPU goes to sleep, the ram, IO cards, disks and the power supply are still using their full quota of energy.
The way to ensure "important stuff" gets done in a timely manner, is to use the prioritisation capabilities of your OS. Your web server should be running with 'IO priority' (ABOVENORMAL), whilst CPU-intensive tasks like this are running with 'background priority' (BELOWNORMAL).
In this way, the background workload will never usurp the web server for attention when it needs to do something, but it will utilise every scrap of CPU that is going spare.
There is nothing wrong with 100% utilisation, provided that prioritisation ensures that the right job gets the CPU when it needs it.
Oh well... guess we'll do it the hard way.
There are actually many refinements to the process you are currently using that could potentially provide even greater benefits than parallelisation.
For example: How many of those source files change on a daily or 4-hourly basis?
It would be a pretty trivial enhancement to retain and check the modification time -- or even MD5 checksum if you're paranoid -- for each of the source files and only reprocess those that actually change.
Unless you have dozens of programmers constantly messing with your code-base, I could see that easily implemented change reducing your xrefing processing demands to single digit percentage points of your current setup.
Even better, assuming that you are using source control -- and that source control is flexible enough to allow you to automate the process -- have the check-in process re-invoke the xrefing process only when a file is modified.
Again, if you are paranoid, you could run the code-base through a full xref cycle once per week on the weekends. Ie. Use an equivalent of the weekly full/daily differential backup procedure.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.