|more useful options|
Best method of munging CSV data - using Text::CSV::Simple?by billie_t (Sexton)
|on Feb 08, 2006 at 05:06 UTC||Need Help??|
billie_t has asked for the wisdom of the Perl Monks concerning the following question:
Hi monks: I've never been particularly proficient in Perl, and I've been away from it for a long time. I've come to a complete blank when trying to get some data from an LDAP dump and rework it into a mail delivery table. A sample of the original data is this:
The valid info we want to extract is represented by the first line - the second column has smtp addresses that we want to grab, and associate with the third column (the username) and the final column (the server name). Eventually I want to achieve:
user@server blah@smtpaddress user@server blah@smtp2address user2@server blah2@smtpaddress user2@server blah2@smtp2addressEach user can have a variable number of STMP addresses. The second line of the original data posted above is an example of an invalid line that we would like to skip (there are no smtp addresses).
I think I can figure out the regexps (mostly) to grab the data I want, such as SMTP: but not FAX: or X400: addresses. Text::CSV::Simple does a great job of ignoring the first column of each line (which is not needed), and stashing the others somewhere else, using this code:
That appears to be the easy part. I need to achieve the remaining steps:
1. Ignore lines of data that do not contain smtp:. I was thinking of something like the following snippet, but there's a difficulty I'll explain after this list:
2. Ensure that all the SMTP addresses extracted match up to the correct userid.
3. Ensure that all the SMTP addresses and userids match up to the correct server name (I'll extract the server name by using a simple substr (since it's always the last 6 characters at the end of that column)).
4. Output to another file data of the format "user@server user@smtpadddress" (outputting is not a problem)
The main problem is, by reading into a single-dimension array, I think that I'm losing the "columns" which associate the SMTP addresses with the correct userid and server. I simply don't know enough about arrays or hashes (or Perl, other than the fact it's by far the best tool for the job) to structure the data in the appropriate way so that I can extract what I want.
Any help would be appreciated. The input files will not be larger than 2MB, and the whole thing will be running on up-to-date kit (so reading into memory should not be a problem). And I'm sorry I couldn't come up with more code - not having a clue about appropriate structure is not helping me. It may be that Text::CSV::Simple is not right for the job, but I couldn't quite get to grips with the syntax of Text::CSV_XS, for example.