Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Clearing lines in a file based on an array containing the lines

by rycher (Acolyte)
on Apr 22, 2009 at 20:16 UTC ( #759395=perlquestion: print w/ replies, xml ) Need Help??
rycher has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I'm trying to simplify my subroutine that is supposed to open an (LDIF)text file, read in its contents into an array then print to a new file while excluding the lines that exist in another array.

I can currently accomplish this with just the word I want to exclude, but of course this leads to bloated code because I have to continuously open and close the file on each removal.

Here is what I have so far:

sub GROOMLDIF { # Delete LDAP-internal fields my @erasethese = qw(structuralObjectClass entryUUID creatorsName + modifiersName createTimestamp modifyTimestamp entryCSN); open(FILE,"< data/all.ldif"); my @LINES = <FILE>; close(FILE); open(FILE,"> data/groomed.ldif"); foreach my $LINE (@LINES) { my @array = split(/\:/,$LINE); print FILE $LINE unless ($array[0] eq "$erasethese[0]"); } close(FILE); open(FILE,"< data/groomed.ldif"); @LINES = <FILE>; close(FILE); open(FILE,"> data/groomed.ldif"); foreach my $LINE (@LINES) { my @array = split(/\:/,$LINE); print FILE $LINE unless ($array[0] eq "$erasethese[1]"); } close(FILE); }

I would like to combine all of them into one open/close. I've tried a nested for-loop that incremented with no luck.

Any help would be appreciated.

Comment on Clearing lines in a file based on an array containing the lines
Download Code
Re: Clearing lines in a file based on an array containing the lines
by ramrod (Chaplain) on Apr 22, 2009 at 20:27 UTC
    You can eliminate the second "grooming" by comparing against your elements at the same time:

    print FILE $LINE unless ($array[0] eq "$erasethese[0]" || $array[0] eq + "$erasethese[1]" );
    As far as only opening the file once, you can try opening it for both reading and writing (+<)
Re: Clearing lines in a file based on an array containing the lines
by almut (Canon) on Apr 22, 2009 at 20:32 UTC

    When you put the names of the fields in a hash

    my %erasethese = map { $_ => 1 } qw(structuralObjectClass entryUUID creatorsName modifiersName createT +imestamp modifyTimestamp entryCSN);

    you can then write

    print FILE $LINE unless exists $erasethese{$array[0]};

    which would test if $array[0] is any of the keywords (which - if I've understood you correctly - is what you want to achieve).

      Thank you Almut... that was awesome. I was 'this' close (*holding fingers very close together.*)

      Thank you monks for your assistance. Much appreciated.
Re: Clearing lines in a file based on an array containing the lines
by toolic (Bishop) on Apr 22, 2009 at 20:33 UTC
    I think grep might help you out, especially if you need to check against all of the @erasethese items. Since you did not provide a small sample of your input file, I will only offer this untested code (which does check against all @erasethese items):
    sub GROOMLDIF { # Delete LDAP-internal fields my @erasethese = qw( structuralObjectClass entryUUID creatorsName modifiersName createTimestamp modifyTimestamp entryCSN ); open my $fh_in , '<', "data/all.ldif" or die "can not open dat +a/all.ldif: $!"; open my $fh_out, '>', "data/groomed.ldif" or die "can not open dat +a/groomed.ldif: $!"; while (<$fh_in>) { my $line = $_; my @thing = (split /:/)[0]; print $fh_out $line unless (grep {$thing eq $_} @erasethese); } }
Re: Clearing lines in a file based on an array containing the lines
by jwkrahn (Monsignor) on Apr 22, 2009 at 23:33 UTC

    Perhaps this will work better for you (UNTESTED):

    sub GROOMLDIF { # Delete LDAP-internal fields my $erasethese = qr/\A(?:structuralObjectClass|entryUUID|creatorsN +ame|modifiersName|createTimestamp|modifyTimestamp|entryCSN):/; open my $IN, '<', 'data/all.ldif' or die "Cannot open 'data/a +ll.ldif' $!"; open my $OUT, '>', 'data/groomed.ldif' or die "Cannot open 'data/g +roomed.ldif' $!"; while ( <$IN> ) { print $OUT $_ unless /$erasethese/; } }
Re: Clearing lines in a file based on an array containing the lines
by NiJo (Friar) on Apr 23, 2009 at 18:12 UTC
    $command = join (' | grep -v ', @erase_these) system "cat $in_file" . $command . " > $out_file"
    Least amount of your code, C speed, use of multiple cores for free

    Limitation: maximum command length on shell

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://759395]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2015-07-07 00:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (86 votes), past polls