Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
This is off topic. But since you are curious, I'll tell you more. I have an address that I get from a government DB. A club has a competition and their "divisions" are defined by county and state boundaries. I wanted to use the address that I have to figure out which of the various "divisions" a person is actually living within.

As it turns out. zip codes are for the convenience of the USPS. They can cross county and even state boundaries! I was shocked to find that out, but it is true. I was even more surprised when I found out that the state in your mailing address may not actually be where that particular street address is! You can have a Florida mailing address, but your house is actually over the border in Mississippi. Yep, a registered voter in MS can have a FL address! And they can't vote for the Governor of Florida!

The USPS does have a DB listing a single county and a single state for each zip code. In this DB there could be (and is) a zip code that I found which lists a WV county but PA as the state. We don't know for sure, but this guy probably actually lives in WV even though his mailing address shows PA as the state. That same zip code can be used for addresses both in PA and in WV.

Be that as it may, I decided to proceed further to see "how close" with some heuristics I could come to the "correct answer". I wound up doing pretty well on that. Meaning from an address, I can predict with high probability (not certainty), but high probability which one of this club's "divisions" this guy lives in. Then after much data crunching, I found a source of errors that dwarfs any uncertainty in my heuristics. Some of these guys have multiple residences and they may not actually live at their mailing address! So at this point, at least this small part of my project has fallen apart.

Sometimes you have to crunch a bunch of data to find out what you don't know! By cross checking between other DB's, I can come up with a rough approximation of the accuracy of my code so far. At a cost of 2 orders of magnitude in complexity, I could improve upon my zip code DB. But that effort wouldn't matter because the address on file may be a 2nd home or Mom's or Dad's, etc.!

I have more pieces of my puzzle to solve and more data crunching is going on...this zip code to county/state of residence was just one question of many. In the meantime, my SQL skills are improving.

For development, I just run the Perl program from my programming text editor where the output is captured in another editor window. Some of these "pre-production" things spew out 50K lines of analysis so that I can see why it is making the decisions that it is. Then I tweak the Perl code to change a heuristic and run it again to apply human brain power to decide how well or not well it is working. At the same time I have my SQLite GUI up and running. So that I can look at output tables and run ad hoc SQL commands.

At the end, I will create a couple of pseudo CSV files. I've been yelled at more than once for claiming that such things exist in the real world. They do. I will probably make a .bat file like I suggested for you in another post. That .bat file will run the correct sequence of "filters" to generate my final result. However, I may just continue to run each "phase" manually so that I can apply human judgement and "sanity determination". This thing is far from a "turn key end user" program (and it will never be).

In reply to Re^3: [OT] designing a program - your wisdom needed by Marshall
in thread designing a program - your wisdom needed by SpaceCowboy

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2022-06-25 11:02 GMT
Find Nodes?
    Voting Booth?
    My most frequent journeys are powered by:

    Results (81 votes). Check out past polls.