Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: creating and managing many hashes

by Marshall (Canon)
on Feb 18, 2018 at 19:57 UTC ( #1209455=note: print w/replies, xml ) Need Help??

in reply to creating and managing many hashes

HI Gtforce,

I suppose that this flat data file is coming from some fancy corporate sales/inventory DB. This may sound flippant, but buying some dinner and drinks for the person who generated file for you might yield the most efficient/effective solution for you! But I guess you have already considered that...

It sounds conceivable that your data processing could all be done in an Excel Spreadsheet with no Perl programming at all. I haven't done any serious spreadsheet work in years, but spreadsheets can be huge now, 2 million rows is possible.

Let's talk about Perl:
You are inexperienced at Perl and sounds like you have no SQL experience. However, I believe that a solution that involves learning the "least amount of new stuff" will involve learning a targeted subset combination of both Perl and SQL. Using an SQLite DB will simplify the data structures that the Perl code has to work with (less fancy Perl to learn). I believe that learning basic DBI will simplify your Perl code.

SQlite is the most used DB in the world because it is on every smart phone. SQlite doesn't require any fancy server setup and admin - it uses a simple file for its work. So huge admin hassles just disappear. You will need to learn how to create tables, insert new records, select (i.e. get) records from the DB. Only a very,very small subset of SQL needs to be learned. For the Perl I/F with SQLite, you will need to learn a subset of Perl data structures. I recommend only one: how handle an AoA, a 2D array structure or a reference to such a thing. Don't start with learning everything, just learn this fetchall_arrayref() function well.

From what I see so far, a basic idea could be:

  • Create SQlite DB table(s) for Input Data and Stat Table
  • Import Data - This runs at maybe 50,000 records per second. Should be run as a single transaction.
  • get list of unique product names - one SQL command.
  • foreach product, calculate stats and put in Stat Table - slower than import, but fast
  • generate the 2 million combinations - fast
  • Run each combo, results appear at 10,000 per second or faster
The idea of this running for 4 hours is insane. Something is seriously wrong if this doesn't run in <4 minutes.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1209455]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2021-06-20 07:34 GMT
Find Nodes?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)

    Results (94 votes). Check out past polls.