I suppose that this flat data file is coming from some fancy corporate sales/inventory DB. This may sound flippant, but buying some dinner and drinks for the person who generated file for you might yield the most efficient/effective solution for you! But I guess you have already considered that...
It sounds conceivable that your data processing could all be done in an Excel Spreadsheet with no Perl programming at all. I haven't done any serious spreadsheet work in years, but spreadsheets can be huge now, 2 million rows is possible.
Let's talk about Perl:
You are inexperienced at Perl and sounds like you have no SQL experience. However, I believe that a solution that involves learning the "least amount of new stuff" will involve learning a targeted subset combination of both Perl and SQL. Using an SQLite DB will simplify the data structures that the Perl code has to work with (less fancy Perl to learn). I believe that learning basic DBI will simplify your Perl code.
SQlite is the most used DB in the world because it is on every smart phone. SQlite doesn't require any fancy server setup and admin - it uses a simple file for its work. So huge admin hassles just disappear. You will need to learn how to create tables, insert new records, select (i.e. get) records from the DB. Only a very,very small subset of SQL needs to be learned. For the Perl I/F with SQLite, you will need to learn a subset of Perl data structures. I recommend only one: how handle an AoA, a 2D array structure or a reference to such a thing. Don't start with learning everything, just learn this fetchall_arrayref() function well.
From what I see so far, a basic idea could be:
- Create SQlite DB table(s) for Input Data and Stat Table
- Import Data - This runs at maybe 50,000 records per second. Should be run as a single transaction.
- get list of unique product names - one SQL command.
- foreach product, calculate stats and put in Stat Table - slower than import, but fast
- generate the 2 million combinations - fast
- Run each combo, results appear at 10,000 per second or faster
The idea of this running for 4 hours is insane. Something is seriously wrong if this doesn't run in <4 minutes.
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.