http://www.perlmonks.org?node_id=844370


in reply to Re: Search and Remove
in thread Search and Remove

How about this?

Only thing is when I include:

use warnings;

I get the following errors:

Use of uninitialized value in string eq at ./delete_list_ver2.pl line +27. Use of uninitialized value in print at ./delete_list_ver2.pl line 37.
and the really strange thing is when I get the length of @orgList after deleting all those lines it is the lenghth at the beginning - 1... strange.
# open files, read into array, sort array open (ORGLIST, "<", $ARGV[0]); open (DELLIST, "<", $ARGV[1]); my @orgList = <ORGLIST>; my @delList = <DELLIST>; close ORGLIST; close DELLIST; @orgList = sort(@orgList); @delList = sort(@delList); # counter for entries removed my $j = 0; # iterate through @delList foreach my $line (@delList){ # iterate through @orgList for ( my $i = 0 ; $i < @orgList ; $i++ ){ # if found delete from array if ($line eq $orgList[$i]){ $j++; delete $orgList[$i]; } # end if } # end for } # end foreach print "Entries Removed: $j \n"; open ( NEWORGLIST, ">", "test.out"); print NEWORGLIST @orgList; close (NEWORGLIST); unlink @orgList; unlink @delList;

Replies are listed 'Best First'.
Re^3: Search and Remove
by choroba (Cardinal) on Jun 12, 2010 at 22:07 UTC
    You cannot use length to get the number of array elements, see length and scalar (the latter you should use instead).

    The last line is also strange. Do you really want to unlink the arrays?

Re^3: Search and Remove
by rjt (Curate) on Jun 12, 2010 at 22:44 UTC

    Well, this is getting better, but still doesn't really match the algorithm. You have eliminated a lot of unnecessary reads, but you still have an O(n*m) solution, which can be improved on greatly. To get around this (and make the code much cleaner), you need to implement the hashing step I outlined in the above node.

    Here's just a small snippet to get you thinking. This does the setup of the %del hash.

    # Step 2 my %del; open my $dellist, '<', $delfile or die "Could not open $delfile: $!\n" +; for (<$dellist>) { chomp; $del{$_} = 1; } close $dellist; # Step 3

    Now the 'key' bit (if you'll pardon the pun) is that you never, ever have to iterate through that list again, or open the del file again.

    Step 4 is where you iterate through your org list. The setup will be similar to the del list, except that for each line in your org list, you just need to check whether it exists in your %del hash and act accordingly. For instance, exists $del{1234} returns true if and only if `1234' is a key in your %del hash. Read the exists page for more detail.

    From your code it looks like you have influence from other structured programming languages. (Are you perhaps thinking that unlink is something like C's free(3) to free allocated memory? You don't need to do that in Perl; it handles garbage collection for you automatically.)

    That influence from other languages will certainly help you, but Perl has rich features that you'll want to take advantage of. For instance, the introduction on hashes could be sort of life changing if you're coming from a language that doesn't have a built-in hash/associative array data type.

      From your code it looks like you have influence from other structured programming languages. (Are you perhaps thinking that unlink is something like C's free(3) to free allocated memory? You don't need to do that in Perl; it handles garbage collection for you automatically.)

      Haha, that's what they said about java... and, well--we all know the truth...

      I started with BASIC when I was a wee tot. I only ever learned enough to copy out of the BASIC book. From there I took a VBasic course and learned VB6. From that I was able to apply what I'd learned into automating MANY tasks with VBA. Parallel to that I was trying to learn C++; I've never even gotten a "Hello World" program to work in C++. After that I focused my energy on C# which I grasped rather quickly and banged out several "practice" programs. Also IMPORTANT to note this is when I discovered the "open source" world, and discovered how to shed my Microsucks dependencies...

      Enter Ubuntu...

      At this point I've changed schools and majors (I'm a certified EMT... But I needed a way to pay for my paramedic degree...). Enter Computer Science.

      Computer Science:

      Fortunately as a previous Math major I've completed most of the requirements for my CS degree. As a CS Major, our language of study is java... IMHO on of the most klutzyist languages I've studied.

      In parallel to starting my CS Major I landed a job. This job promoted me programming where I learned to love Perl.

      Anyway... Long Story Short (too late), Out of necessity I've needed to become a Server and Perl Expert over night. Obviously, I still have some learning to do.

      I'm really not sure how making a hash key = 1 is helpful. I understand making "the whole mess" a key is like making it an array.

      I've been studying from "PERL By Examply" By Ellie Quigly. Can you maybe recommend a better book?

        Better than PBE?

        Dick and Jane. See Spot run?

        Sorry, but PBE is (or was in the edition I wish I hadn't bought) poorly organized for use as a primer; shot full of bad practices, misstatements and errors and (IIRC) had no useful errata page.

        Try "Learning Perl," 3rd ed., ISBN 0596001320, O'Reilly, by PM's own Randal Schwartz and Tom Phoenix. possibly followed by "Programming Perl."

      So how about this?
      #!/usr/bin/perl use strict; use warnings; use diagnostics; if (@ARGV = 2){ my %delList; my @orgList; open (BEEPON, >>, "/path/to/beepon.file); open (BEEPOFF, >>, /path/to/beepoff.file); open (DELLIST, <, $ARGV[0]); foreach (<DELLIST>){ %delList{chomp($_)} = 1; } close DELLIST; open (ORGLIST, <, $ARGV[1]); @orgList = <ORGLIST> close (ORGLIST); for (my $i = 0; $i < @orgList; $i++){ if (exists $del{$orgList[$i]}){ print BEEPON $orgList[$i]; } else print BEEPOFF $orgList[$i]; } } close BEEPON; close BEEPOFF;
      I can see how this would be more efficient. I'm not really sure that it is what you outlined... I'm still not entirely understanding the use of hashes. ie why are all the values set to 1?

        You're getting there. Nice to see you coming back with improving iterations. Comments by section, below:

        First of all...

        I can tell by looking that your code won't work--you're interchanging %delList and %del, for example. Make it easier on us, please: test your own programs.

        That being said, there is still plenty I can tell you about your code.

        The DELLIST loop

        You deviated from my example in several important aspects:

        I'm not sure why you've gone with the bareword filehandle approach. Best practice (as in my example) is to go with a lexical filehandle with my.

        Second: Hash and array members are always scalars, so the sigil is always $ in Perl 5 (unless you're working with slices, but that doesn't apply here.) So your %delList{...} should be $delList{...}, to access individual hash elements. Your @array usage appears to be correct.

        Third: Re-read chomp carefully. You're using its return as a hash key value, but it returns "the total number of characters removed from all its arguments", which is not what you want. You want the chomp'd string. That's why I did chomp first and then stuck $_ into the hash.

        Last, to answer your question about hashes and why all of the values are set to 1, first let me make sure you're not talking about keys. With your code as-is, probably every key will be set to 1, because that's what chomp will return for each line (each line has one newline). With me so far?

        Hash keys are the strings used to address data elements in the hash. Hash values are the scalars containing whatever data you want to be associated with each key. For example:

        my %age; # Age of a person, in years $age{'Sonny'} = 30; $age{'Junior'} = 10; $age{'Poppa'} = 40; foreach my $person (sort keys %age) { print "$person is $age{$person} years old\n"; }

        In your case, we want to know if a certain entry should be written to one file or another. The way we define that is, if a key exists in %dellist for that entry, that entry should be written to "beepon", else it's written to "beepoff". (What these names mean, I have no idea, by the way.) We assign 'something' a true value in %dellist with $dellist{'something'} = 1. In the loop, that 'something' happens to be the line that was read from the $dellist filehandle, minus the trailing newline.

        Efficiency of scanning a list (your old method) versus using a hash is something similar to opening every drawer in your chest of drawers one by one until you find one that has socks, versus remembering that $drawer{'socks'} is 'topleft' and just opening the top left drawer. Like most analogies, it's not perfect, but describes the efficiency gain fairly well.

        foreach

        Your for loop looks mostly functional, but you don't need to use C-style constructs, and you certainly don't need to read the entire file into an array first (waste of memory, possibly significant). You also don't need a clumsy if/else; there's a fabulous example of how to specify a filehandle as an expression right in print.

        open my $orglist, '<', $ARGV[1] or die "Can't open $ARGV[1]: $!"; foreach my $item (<$orglist>) { chomp($item); print { exists $dellist{$item} ? $beepon : $beepoff } $item . "\n" +; } close $orglist;

        Of course, the above assumes you've already done away with the bareword filehandles in favour of scalars.