Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: [OT] designing a program - your wisdom needed

by Marshall (Canon)
on Jan 21, 2022 at 21:59 UTC ( #11140688=note: print w/replies, xml ) Need Help??


in reply to Re^2: designing a program - your wisdom needed
in thread designing a program - your wisdom needed

This is off topic. But since you are curious, I'll tell you more. I have an address that I get from a government DB. A club has a competition and their "divisions" are defined by county and state boundaries. I wanted to use the address that I have to figure out which of the various "divisions" a person is actually living within.

As it turns out. zip codes are for the convenience of the USPS. They can cross county and even state boundaries! I was shocked to find that out, but it is true. I was even more surprised when I found out that the state in your mailing address may not actually be where that particular street address is! You can have a Florida mailing address, but your house is actually over the border in Mississippi. Yep, a registered voter in MS can have a FL address! And they can't vote for the Governor of Florida!

The USPS does have a DB listing a single county and a single state for each zip code. In this DB there could be (and is) a zip code that I found which lists a WV county but PA as the state. We don't know for sure, but this guy probably actually lives in WV even though his mailing address shows PA as the state. That same zip code can be used for addresses both in PA and in WV.

Be that as it may, I decided to proceed further to see "how close" with some heuristics I could come to the "correct answer". I wound up doing pretty well on that. Meaning from an address, I can predict with high probability (not certainty), but high probability which one of this club's "divisions" this guy lives in. Then after much data crunching, I found a source of errors that dwarfs any uncertainty in my heuristics. Some of these guys have multiple residences and they may not actually live at their mailing address! So at this point, at least this small part of my project has fallen apart.

Sometimes you have to crunch a bunch of data to find out what you don't know! By cross checking between other DB's, I can come up with a rough approximation of the accuracy of my code so far. At a cost of 2 orders of magnitude in complexity, I could improve upon my zip code DB. But that effort wouldn't matter because the address on file may be a 2nd home or Mom's or Dad's, etc.!

I have more pieces of my puzzle to solve and more data crunching is going on...this zip code to county/state of residence was just one question of many. In the meantime, my SQL skills are improving.

For development, I just run the Perl program from my programming text editor where the output is captured in another editor window. Some of these "pre-production" things spew out 50K lines of analysis so that I can see why it is making the decisions that it is. Then I tweak the Perl code to change a heuristic and run it again to apply human brain power to decide how well or not well it is working. At the same time I have my SQLite GUI up and running. So that I can look at output tables and run ad hoc SQL commands.

At the end, I will create a couple of pseudo CSV files. I've been yelled at more than once for claiming that such things exist in the real world. They do. I will probably make a .bat file like I suggested for you in another post. That .bat file will run the correct sequence of "filters" to generate my final result. However, I may just continue to run each "phase" manually so that I can apply human judgement and "sanity determination". This thing is far from a "turn key end user" program (and it will never be).

  • Comment on Re^3: [OT] designing a program - your wisdom needed

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11140688]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2022-06-28 06:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (90 votes). Check out past polls.

    Notices?