Voronich has asked for the wisdom of the Perl Monks concerning the following question:
I was bitching a blue streak about this issue in CB and realized I needed input on this, just in case my head's not on straight.
I get 10 files a month. They generally report the ongoing state of a list of accounts, each a monthly snapshot.
For the sake of argument, let's assume I can cobble together a primary key for these files, be it an explicit account number or a string composite of fields.
I'm merging these things into one table that looks more or less as follows:
The idea is to create generic account ids (a new field that I'm generating) to correspond to each account across the 10 files as needed, based on a monthly load proccess.table account_stuff account_id int, source_file varchar, host_key varchar(huge), open_date date, close_date date
So the open_date corresponds to either "the date that the account started showing up in the file" or "the open date specified in the file" (if one exists.)
Close date is as you'd figure, optimistically the 'specified close date' or, more interestingly "the last month in which the account appeared on an input file."
I'm cobbling this together now. But the rough pseudocode in my head seems a bit gross for reasons I can't quite put my finger on. It smells funny:
So basically it looks like "left outer join, test for nulls, then right outer join and test for nulls."- For each input file that's loaded - Check it's list of primary keys against the account_stuff host_key for a match. - If there's not a match, add a row. - Check the list of account_stuff 'source_file's host_keys against t +he imported file. - If there are missing rows, update with a close_date
I was going to drive this entirely in perl, but it's starting to look like I could slap it into a series of stored procs.
What am I missing here?
|Replies are listed 'Best First'.|
Re: Assigning account numbers, intuiting open and close dates
by InfiniteSilence (Curate) on Apr 09, 2014 at 18:46 UTC
by Voronich (Hermit) on Apr 09, 2014 at 19:02 UTC