Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Net::LDAP::LDIF->new Chokes on large LDIF because of comments every 5k lines

by 3dbc (Monk)
on Dec 27, 2017 at 17:06 UTC ( [id://1206277]=perlquestion: print w/replies, xml ) Need Help??

3dbc has asked for the wisdom of the Perl Monks concerning the following question:

Original post:

Hi Perl Monks,

Trying to figure out a way to do an my $ldif = Net::LDAP::LDIF->new( "../packages/ldapsearch.ldif", "r") or die "file not exits\n"; on the output of an exec qx[customldapsearch_from_vendor command].
I want to use perl to process the large ldif it creates and format it into a csv report. Have most of it done, but want to be able to dynamically control the exec win32 command from perl and then immediately pass the output into the new LDAP LDIF.
Right now the exec doesn't release the file so I can than process it in perl through the ldif call and I'm getting an error when opening the file.


Thanks!!



Hi Perl Monks,

Having an issue reading in a huge LDIF with comments every 5 thousand lines, how do I ignore those when opening the ldif with net::LDAP::ldif->new?

example of LDIF with comments:
# userid2, help, perl.monks dn: uid=userid2,ou=help,dc=perl,dc=monks FriendlyId: crazyID2 IdCreatedOn: 28-02-2015 04:40:55 #BELOW IS THE COMMENTS PERL CHOKES ON EVERY 5k lines with LDIF MODULE # search result # search: 2 # result: 0 Success control: 1.2.840.113556.1.4.319 false BLAHBLAH # extended LDIF # # LDAPv3 # base <ou=help,dc=perl,dc=monks> with scope subtree # filter: (|(Id1=*)(Id2=*)) # requesting: FriendlyId IdCreatedOn # with pagedResults critical control: size=1000 # # userid, help, perl.monks dn: uid=userid,ou=help,dc=perl,dc=monks FriendlyId: crazyID IdCreatedOn: 28-02-2015 04:40:55
Do I have to parse these out of the ldif before reading it in with net ldap ldif?
my $ldif = Net::LDAP::LDIF->new( "../packages/ldapsearch.ldif", "r") o +r die "file not exits\n"; while( not $ldif->eof( ) ) { my $entry = $ldif->read_entry( );
Right now it's reading in the file it generates but only gets like a quarter of the way through it then throws an error entry not valid on the line with this: $entry = $ldif->read_entry()

Shouldn't NET LDAP LDIF conform to the standard

UPDATE:

Never mind found out my own issue, as you can see in the comments section every 5k lines it has one un-commented line in the middle. so the ldif I'm using doesn't conform to the standard. wish I could use LDIF module to create the ldif, but need to page the results every 1k search results.

2017-12-28 Athanasius restored original content and added code tags

- 3dbc

Replies are listed 'Best First'.
Re: Net::LDAP::LDIF->new Chokes on large LDIF because of comments every 5k lines
by NetWallah (Canon) on Dec 27, 2017 at 20:44 UTC
    LDIF.pm Does indeed handle comments (See "sub _read_lines" ~ line 83).

    Are you using the latest version of Net::LDIF (0.26) ?

    Add some diagnostics to see how far along the LDIF file it is able to read, and at what point if fails.
    I suggest printing the first ~250 bytes of Dumper output for each $entry.

                    We're living in a golden age. All you need is gold. -- D.W. Robertson.

      I used some print statements in the loop to find the last dn it processed, then when I immediately looked after it I found the recurring non-complaint ldif comment showing up. So now I process it with perl to remove the comments before reading it into ldif module.

      perl -MNet::LDAP::LDIF -le "print $Net::LDAP::LDIF::VERSION" 0.15
      - 3dbc
Re: open ldif
by haukex (Archbishop) on Dec 27, 2017 at 20:01 UTC

    Before posting this, I saw that you have replaced the entire contents of your post with an entirely new question. Please see How do I change/delete my post?, in particular "It is uncool to update a node in a way that renders replies confusing or meaningless". Because I already put some effort into my answer I am posting it anyways, and have considered your node for editing. (Update: It appears you edited your post multiple times and the version I saw was not the original. The same comments apply to those edits too.) Please post your new question in a new thread.

    Trying to figure out a way to do an my $ldif = Net::LDAP::LDIF->new ... on the output of an exec qx ...
    system qx[...];
    Should I be using an System or Exec? I think I should be doing something like this, but use the ldap::ldif->new with EXEC?

    Both system and qx// call external commands, so what this code is trying to do is run a command with qx[], get its entire output, and feed that as the arguments to a system command, which is almost certainly not what you want. I of course don't know this custom external dxsearch tool you are using and I am not an expert on Net::LDAP::LDIF, but its documentation does make clear that you can give Net::LDAP::LDIF->new a filehandle instead of a filename. Based on your description, in this case, that filehandle could come from, for example, a piped open, which I describe in my post here along with lots more on running external commands, which I also strongly recommend you check out.

    I want to use perl to process the large ldif it creates and format it into a csv report.

    Since the code you showed doesn't use it, I recommend Text::CSV (also install Text::CSV_XS for speed).

      Thanks, that is what I was looking for. Need to use STRIKE next time instead, but just kept working through it and probably posted too early but that open filehandle way looks more efficient than what I was doing and answered my original question.
      I'm using DXSEARCH to get tens of thousands of directory search results even though the LDAP server has a search limit set of 1k per search and there's no easy way to deal with that with LDIF module (paged client side searching).
      - 3dbc
        I think my original question was more along the lines of all these files I'm opening and closing and how best to manage them. In my script I currently have one system call which uses dxsearch (which is an enhanced ldapsearch) to produce a very large ldif with over 100,000 search results. then I open the same file the dxsearch produces in the next line, (even though I might have it in memory already because I used: my $input = qxdxsearch blahblah, but never use the $input anywhere else in my script). Then I open an output file to convert it to complaint ldif since there are some uncommented bad lines in it. Then I open an output file.csv, I process the ldif generated by dxsearch by reading in the massaged ldif I cleaned up with perl in previous step with net::ldap::ldif->new( so it seems in one script I have three open's, two closes, a file being generated by a system call and net ldap reading in a file I closed right before calling it, not to mention an $input variable probably storing the whole un-massaged ldif in memory which is never used because I wasn't sure that would work with net ldap ldif new. Everything seems to be working fine, just was interested if there's any best practice here or if it's better I open, close files to conserve memory. I guess that is kindof the question I was asking, but once I solved the issue with processing the ldif decided to change it since I didn't want to waste anyone's time.

        I also have a bad habit of compulsively editing my posts after submitting them, (for instance I've probably edited this one 4 or 5 times since I originally posted it ugh) not sure if anyone else has fallen victim to that trap too, just saying ;-)
        - 3dbc

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1206277]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (1)
As of 2024-04-25 07:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found