|Problems? Is your data what you think it is?|
I've implemented this approach, as it seems fairly close to the sort of solution I was looking for. Unfortunately, it is still rather slow. I started the process 2.5 days ago now (it's been running over 60 hours) and it is about half-way through the material. So it appears with this method it will take 5 days of 100% CPU on one of four cores of my Dell PowerEdge server. That's a little disappointing. My ugly approach, which may be slightly less thorough, finished after about three days. So it was 40% quicker.
Given the complexity of the regex, I suppose I cannot blame perl or the program itself, it's just the way it is. But without the attempt to narrow the search to finding numbers between their respective forerunners/postrunners, the whole search can complete in less than five minutes.
Anyway, at least I have learned something and I much appreciate your patience in demonstrating this method for me. I may still be able to use this as a final check over a long weekend or something, or perhaps I can limit the amount of material to be checked at a time (~130 books total). Thank you!
In reply to Re^4: How to use "less than" and "greater than" inside a regex for a $variable number