comment on

Hi monks: I've never been particularly proficient in Perl, and I've been away from it for a long time. I've come to a complete blank when trying to get some data from an LDAP dump and rework it into a mail delivery table. A sample of the original data is this:

"CN=MBX_MANA,OU= Mailboxes,DC=doman,DC=com",SMTP:Manager@domain.com;sm
+tp:MBX_MANA@domain.com;FAX:MBX_MANA@domain.com;X400:c=us\;a= \;p=doma
+in\;o=Exchange\;s=MANAGER\;,MBX_MANA,/o=Exchange Org/ou=First Adminis
+trative Group/cn=Configuration/cn=Servers/cn=SERVER
"CN=Guest,CN=Users,DC=domain,DC=com",,Guest,
[download]

The valid info we want to extract is represented by the first line - the second column has smtp addresses that we want to grab, and associate with the third column (the username) and the final column (the server name). Eventually I want to achieve:

user@server    blah@smtpaddress
user@server    blah@smtp2address
user2@server   blah2@smtpaddress
user2@server   blah2@smtp2address

Each user can have a variable number of STMP addresses. The second line of the original data posted above is an example of an invalid line that we would like to skip (there are no smtp addresses).

I think I can figure out the regexps (mostly) to grab the data I want, such as SMTP: but not FAX: or X400: addresses. Text::CSV::Simple does a great job of ignoring the first column of each line (which is not needed), and stashing the others somewhere else, using this code:

my $parser = Text::CSV::Simple->new;
$parser->want_fields(2, 3, 4);
my @data = $parser->read_file($infile);
[download]

That appears to be the easy part. I need to achieve the remaining steps:
1. Ignore lines of data that do not contain smtp:. I was thinking of something like the following snippet, but there's a difficulty I'll explain after this list:

foreach my $line (@data) {
    chomp;
    if $line =~ /SMTP:/i 
{do stuff} }
[download]

2. Ensure that all the SMTP addresses extracted match up to the correct userid.
3. Ensure that all the SMTP addresses and userids match up to the correct server name (I'll extract the server name by using a simple substr (since it's always the last 6 characters at the end of that column)).
4. Output to another file data of the format "user@server user@smtpadddress" (outputting is not a problem)

The main problem is, by reading into a single-dimension array, I think that I'm losing the "columns" which associate the SMTP addresses with the correct userid and server. I simply don't know enough about arrays or hashes (or Perl, other than the fact it's by far the best tool for the job) to structure the data in the appropriate way so that I can extract what I want.

Any help would be appreciated. The input files will not be larger than 2MB, and the whole thing will be running on up-to-date kit (so reading into memory should not be a problem). And I'm sorry I couldn't come up with more code - not having a clue about appropriate structure is not helping me. It may be that Text::CSV::Simple is not right for the job, but I couldn't quite get to grips with the syntax of Text::CSV_XS, for example.

In reply to Best method of munging CSV data - using Text::CSV::Simple? by billie_t

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl-Sensitive Sunglasses
	PerlMonks