Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Random personal names

by Tux (Canon)
on Apr 11, 2011 at 06:14 UTC ( [id://898662]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Random personal names
in thread Random personal names

I use things like this quite a lot, but I'm not allowed to publish the code. The project I use it for involves "anonymizing" databases, so I can create reproducible customer situations.

In that process, I first collect all surnames and given names from several databases, split them on whitespace grouped by gender. Then I shuffle the list of names and create new names from the existing list by randomly picking 2 to 5 names from the correct gender list and put them in a random order and assign the new name to the anonymized victim.

The problem that is faced here, is that I have to go through all related databases too, to change the name of the parents and children so the the relations still match.

I do the same for date of birth and place of birth. And for ZIP codes.

The best part however is the addresses. First I collect all the street names from all the databases I have access to, then I split the street names on known extensions: "street", "alley", "boulevard", "road", "way", "path", etc etc. Then I take the first part of those, shuffle them and generate new street names based on the prefix + any of the known extensions. "Bondstreet" thus creates "Bondstreet", "Bondalley", "Bondroad", etc etc. I then shuffle the new list and replace all the original street names with a random pick from the new list.

Together with some other changes, someone with knowledge of the original database said he was unable to "see" what persons were involved in the new data set. This way we can mimic problems at any size of customer database, as we now generate a new one from an existing one with the same size and relations and the "anonymize" the complete set.

This has proven to be a very useful approach. All done in perl of course and nothing to do with spam or hackers.


Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://898662]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-26 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found