My current approach is going off flat files, and I think I need just two asynchronous processes resulting in two flat files. The first process looks at the products and works out the distinct pairs (i.e., going from 2,000 distinct products to 2 million distinct pairs).
pair 1: apple orange
pair 2: apple banana
pair 3: apple grape
pair 4: orange banana
pair 5: orange grape
pair 6: banana grape
The above flat file serves as the input to the second process which looks at the price and inventory for each product and pair to work out the stats. The end result of this second process is written into the final flat file.
My tryst with perl and programming in general is about 3 months. The approach I'm taking looks neat and simple from a solution perspective to me (coding it however "feels" a little different, but am keen to put in the effort to keep the coding also simple). My reluctance to use an rdbms is that I'd probably have to teach myself "the how-to" especially when things break and fall apart.