emilford has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I am trying to write a script that will help out a fellow co-worker who has not yet been enlightened of the powers of perl. I already managed to impress when I took 5 minutes to write a script that ran for 30s, that saved her at least an hour of work. She has a database full of names that follow no specific format, that she needs to seperate down to
I know that this is feasible with a fairly complex regex, which is where I'm running into some problems. I'm sure I could put something together that would work fairly well, but I want to try and write code that will perform appropriately for all cases.
To show that I'm not just asking you guys to solve my problem, I have come up with some ideas that I think need to be incorporated into the regex.
(< > marks chunk tossed into variable)
Thanks in advance,
Eric
I am trying to write a script that will help out a fellow co-worker who has not yet been enlightened of the powers of perl. I already managed to impress when I took 5 minutes to write a script that ran for 30s, that saved her at least an hour of work. She has a database full of names that follow no specific format, that she needs to seperate down to
Some might have all this information, some might not.1) title 2) first name 3) middle initial 4) last name
I know that this is feasible with a fairly complex regex, which is where I'm running into some problems. I'm sure I could put something together that would work fairly well, but I want to try and write code that will perform appropriately for all cases.
To show that I'm not just asking you guys to solve my problem, I have come up with some ideas that I think need to be incorporated into the regex.
- there are multiple titles that are possible (i.e. - LTC, COL, DR, MS, MR, MISS, etc); instead of having a long regex testing LTC|DR|MS|MR, would it be possible to toss them into an array and have a portion of the regex be executed code that iterates through each possibility in the array and returns the match. That way, as new titles come up, they can easily be added.
- the different parts of the name are seperated mostly by spaces: the middle initial could be grabbed with (\w\.) and the first and last names could be grabbed based on \w versus spaces. Is there a better approach?
- there are certain names that are only last names; there could be a special case for this that would lessen the complexity of the regex.
I'd want to be able to seperate this into:Frederick H. Jones Dr. James T. Taylor Dr. Mat L. R. Michaels
(< > marks chunk tossed into variable)
I'm going to start working on this regex and toy around with different ideas. I'll post what I have completed every so often, but any feedback, ideas, suggestions, code would be appreciated.<Frederick> <H.> <Jones> <Dr.> <James> <T.> <Taylor> <Dr.> <Mat> <L. R.> <Michaels>
Thanks in advance,
Eric
|
---|
Back to
Seekers of Perl Wisdom