Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Compare Two Files, Merge Updates from One File to the Other

by rycher (Acolyte)
on May 04, 2009 at 04:56 UTC ( #761658=perlquestion: print w/ replies, xml ) Need Help??
rycher has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have two files to compare. One file contains employee data from a database and the other file contains LDAP directory information in LDIF format.

My question is: How would I open both files, read in all of the contents into arrays, key in on the username as a match-point and update the LDIF file with data from the employee database?

This is what the employee data contains:

dn: uid=simpsonh givenName: Homer sn: Simpson department: Nuclear Control Center buildingName: Radioactive Hall telephoneNumber: 218 555-6793 faxNumber: 218 555-6798 title: Nuclear Control Operator manager: uid=burnsm
Here is data from the LDIF file:

dn: uid=simpsonh,ou=People,dc=sfnp,dc=com userPassword:: e1NTSEF9dkI0NnhHT1A5MTBdgfsdghfSFWHDFW239jhsdv= cn: Homer Simpson givenName: Homer sn: Simpson mail: simpsonh@sfnp.com loginShell: /bin/ksh objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetOrgPerson objectClass: posixAccount objectClass: operator uidNumber: 111 gidNumber: 101 uid: simpsonh timeOut: 1128100385 auditDate: Mon Nov 7 04:50:00 2005 memberOf: cn=Operators,ou=Groups,dc=sfnp,dc=com ssn: 1111 firstTime: 0 homeDirectory: /export/home/simpsonh

...and this is what it should look like when all said and done:

dn: uid=simpsonh,ou=People,dc=sfnp,dc=com userPassword:: e1NTSEF9dkI0NnhHT1A5MTBdgfsdghfSFWHDFW239jhsdv= cn: Homer Simpson givenName: Homer sn: Simpson mail: simpsonh@sfnp.com loginShell: /bin/ksh objectClass: top objectClass: person objectClass: organizationalPerson objectClass: inetOrgPerson objectClass: posixAccount objectClass: operator uidNumber: 111 gidNumber: 101 uid: simpsonh timeOut: 1128100385 auditDate: Mon Nov 7 04:50:00 2005 memberOf: cn=Operators,ou=Groups,dc=sfnp,dc=com ssn: 1111 firstTime: 0 homeDirectory: /export/home/simpsonh department: Nuclear Control Center buildingName: Radioactive Hall telephoneNumber: 218 555-6793 faxNumber: 218 555-6798 title: Nuclear Control Operator manager: uid=burnsm

Thank you in advance for any help and guidance.

Comment on Compare Two Files, Merge Updates from One File to the Other
Select or Download Code
Re: Compare Two Files, Merge Updates from One File to the Other
by graff (Chancellor) on May 04, 2009 at 06:04 UTC
    This is just another case of using a HoH to do the equivalent of an sql "join" on two tables. You should be familiar with the approach by now, so there's no need for folks here to write the code for you again.

    (Of course, you can also restructure the data into RDB tables and use sql to do the join -- same result, and same response: a suitable solution has already been posted at the monastery within the last few days, and it's just a matter of turning the crank.)

    (update: I mention HoH because I assume that the two input files contain data for more than one individual -- the first-layer hash keys are individual ID's and the second-layer keys are the field names and values for the data pertaining to each individual.)

      Hello, Thank you for the advice. I've done a fairly intricate series of SQL inner joins and aliases to have a Cartesian set of data match points into a csv exported file. I also know how to open two files and write into one.

      I will look into the hash of hashes method to insert data into the LDIF formatted file. I have a similar routine in a previous script and what you said about the first layer & second layer keys make sense. Thanks again.

Re: Compare Two Files, Merge Updates from One File to the Other
by ELISHEVA (Prior) on May 04, 2009 at 08:46 UTC
    My question is: How would I open both files, read in all of the contents into arrays, key in on the username as a match-point and update the LDIF file with data from the employee database?
    1. open a file - see perlopentut and open
    2. read contents into arrays As graff has pointed out, arrays aren't the best data structure when you need to compare something in the line of one file with something in the lines of another file. The preferred proceedure is to
      1. read in a record at a time from the first file. Your main challenge here is figuring out when a record has ended. For example, if all records begin with dn: and that string is never found inside a record, you could set the record separator ($/) to "dn:", e.g. local $/='dn:';. If 'dn: uid' is the never-repeated string between records, then use that . Each time Perl reads a "line" it will read until it finds that separator. see readline and perlvar for more information. Note: if there is no fixed string, or if you need to use a regular expression to insure that you only break records when dn: is at the start of a line, please update your post, the techniques for such complicated files are a bit different.
      2. parse the record to extract the user name - see chomp, split, and perlretut for various tools you can use to do that.
      3. store the record in a hash containing username-record pairs, like this: $hKeyedLines{$username}=$record; - see perldata and perldsc for more information on hashes and how to use them.
    3. key in on the user name as a match point - read in the second file( or process database query results) record by record but this time instead of storing a record in an array: (a) parse each record in the second file to extract the username (b) look up the user name in the hash you created from the first file, like this: $lineFromFirstFile = $hKeyedLines{$username}. Once you have looked up the record from the first file,
      1. break up both records into fields
      2. choose which value you prefer from each record
      3. generate a new version of that record. If you store the selected values in variables, you might want to use an interpolated string, e.g. "dn: uid: $username ..." - Given the length of each record you may also want to consider a "here" document. See perlop for more on interpolated strings and "here" documents. Also, you might find perlfaq4: Why don't my <<HERE documents work? helpful.
    4. update the LDIF file - open a temporary file for writing, write out the merged lines to that file. Then when the file is complete, rename the temporary file to the real file. Using a temporary file protects the original file from corrupting in case your program crashes part way through. See perlopentut (again - but this time focus on the bits about opening to write files) and rename. Use File::Temp to generate a name for the temporary file.

    Best, beth

Re: Compare Two Files, Merge Updates from One File to the Other
by strat (Canon) on May 04, 2009 at 10:38 UTC

    It is easier and more secure to parse the employee data file set by set, search for the username in a directory server (not an LDIF file) and do updates on the fly:

    If you export a file, modify it and import it again, you waste time in which other people or jobs may have changed the data (if no other jobs are yet manipulating the data, it may happen in future). So keep the time a modify takes as short as possible => do online updates. And for importing the data again, a script that does updates instead of delete/add is necessary. so why not enhancing this script with the search?

    Best regards,
    perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

      Hmm...doing updates on the fly is a great idea. I will have to investigate the perl-LDAP modules and see if that will be feasible. Thank you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://761658]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2014-08-23 07:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (172 votes), past polls