Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: Search and Remove

by rjt (Deacon)
on Jun 12, 2010 at 22:44 UTC ( #844376=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Search and Remove
in thread Search and Remove

Well, this is getting better, but still doesn't really match the algorithm. You have eliminated a lot of unnecessary reads, but you still have an O(n*m) solution, which can be improved on greatly. To get around this (and make the code much cleaner), you need to implement the hashing step I outlined in the above node.

Here's just a small snippet to get you thinking. This does the setup of the %del hash.

# Step 2 my %del; open my $dellist, '<', $delfile or die "Could not open $delfile: $!\n" +; for (<$dellist>) { chomp; $del{$_} = 1; } close $dellist; # Step 3

Now the 'key' bit (if you'll pardon the pun) is that you never, ever have to iterate through that list again, or open the del file again.

Step 4 is where you iterate through your org list. The setup will be similar to the del list, except that for each line in your org list, you just need to check whether it exists in your %del hash and act accordingly. For instance, exists $del{1234} returns true if and only if `1234' is a key in your %del hash. Read the exists page for more detail.

From your code it looks like you have influence from other structured programming languages. (Are you perhaps thinking that unlink is something like C's free(3) to free allocated memory? You don't need to do that in Perl; it handles garbage collection for you automatically.)

That influence from other languages will certainly help you, but Perl has rich features that you'll want to take advantage of. For instance, the introduction on hashes could be sort of life changing if you're coming from a language that doesn't have a built-in hash/associative array data type.


Comment on Re^3: Search and Remove
Select or Download Code
Re^4: Search and Remove
by PyrexKidd (Monk) on Jun 13, 2010 at 02:22 UTC
    From your code it looks like you have influence from other structured programming languages. (Are you perhaps thinking that unlink is something like C's free(3) to free allocated memory? You don't need to do that in Perl; it handles garbage collection for you automatically.)

    Haha, that's what they said about java... and, well--we all know the truth...

    I started with BASIC when I was a wee tot. I only ever learned enough to copy out of the BASIC book. From there I took a VBasic course and learned VB6. From that I was able to apply what I'd learned into automating MANY tasks with VBA. Parallel to that I was trying to learn C++; I've never even gotten a "Hello World" program to work in C++. After that I focused my energy on C# which I grasped rather quickly and banged out several "practice" programs. Also IMPORTANT to note this is when I discovered the "open source" world, and discovered how to shed my Microsucks dependencies...

    Enter Ubuntu...

    At this point I've changed schools and majors (I'm a certified EMT... But I needed a way to pay for my paramedic degree...). Enter Computer Science.

    Computer Science:

    Fortunately as a previous Math major I've completed most of the requirements for my CS degree. As a CS Major, our language of study is java... IMHO on of the most klutzyist languages I've studied.

    In parallel to starting my CS Major I landed a job. This job promoted me programming where I learned to love Perl.

    Anyway... Long Story Short (too late), Out of necessity I've needed to become a Server and Perl Expert over night. Obviously, I still have some learning to do.

    I'm really not sure how making a hash key = 1 is helpful. I understand making "the whole mess" a key is like making it an array.

    I've been studying from "PERL By Examply" By Ellie Quigly. Can you maybe recommend a better book?

      Better than PBE?

      Dick and Jane. See Spot run?

      Sorry, but PBE is (or was in the edition I wish I hadn't bought) poorly organized for use as a primer; shot full of bad practices, misstatements and errors and (IIRC) had no useful errata page.

      Try "Learning Perl," 3rd ed., ISBN 0596001320, O'Reilly, by PM's own Randal Schwartz and Tom Phoenix. possibly followed by "Programming Perl."

Re^4: Search and Remove
by PyrexKidd (Monk) on Jun 16, 2010 at 01:09 UTC
    So how about this?
    #!/usr/bin/perl use strict; use warnings; use diagnostics; if (@ARGV = 2){ my %delList; my @orgList; open (BEEPON, >>, "/path/to/beepon.file); open (BEEPOFF, >>, /path/to/beepoff.file); open (DELLIST, <, $ARGV[0]); foreach (<DELLIST>){ %delList{chomp($_)} = 1; } close DELLIST; open (ORGLIST, <, $ARGV[1]); @orgList = <ORGLIST> close (ORGLIST); for (my $i = 0; $i < @orgList; $i++){ if (exists $del{$orgList[$i]}){ print BEEPON $orgList[$i]; } else print BEEPOFF $orgList[$i]; } } close BEEPON; close BEEPOFF;
    I can see how this would be more efficient. I'm not really sure that it is what you outlined... I'm still not entirely understanding the use of hashes. ie why are all the values set to 1?

      You're getting there. Nice to see you coming back with improving iterations. Comments by section, below:

      First of all...

      I can tell by looking that your code won't work--you're interchanging %delList and %del, for example. Make it easier on us, please: test your own programs.

      That being said, there is still plenty I can tell you about your code.

      The DELLIST loop

      You deviated from my example in several important aspects:

      I'm not sure why you've gone with the bareword filehandle approach. Best practice (as in my example) is to go with a lexical filehandle with my.

      Second: Hash and array members are always scalars, so the sigil is always $ in Perl 5 (unless you're working with slices, but that doesn't apply here.) So your %delList{...} should be $delList{...}, to access individual hash elements. Your @array usage appears to be correct.

      Third: Re-read chomp carefully. You're using its return as a hash key value, but it returns "the total number of characters removed from all its arguments", which is not what you want. You want the chomp'd string. That's why I did chomp first and then stuck $_ into the hash.

      Last, to answer your question about hashes and why all of the values are set to 1, first let me make sure you're not talking about keys. With your code as-is, probably every key will be set to 1, because that's what chomp will return for each line (each line has one newline). With me so far?

      Hash keys are the strings used to address data elements in the hash. Hash values are the scalars containing whatever data you want to be associated with each key. For example:

      my %age; # Age of a person, in years $age{'Sonny'} = 30; $age{'Junior'} = 10; $age{'Poppa'} = 40; foreach my $person (sort keys %age) { print "$person is $age{$person} years old\n"; }

      In your case, we want to know if a certain entry should be written to one file or another. The way we define that is, if a key exists in %dellist for that entry, that entry should be written to "beepon", else it's written to "beepoff". (What these names mean, I have no idea, by the way.) We assign 'something' a true value in %dellist with $dellist{'something'} = 1. In the loop, that 'something' happens to be the line that was read from the $dellist filehandle, minus the trailing newline.

      Efficiency of scanning a list (your old method) versus using a hash is something similar to opening every drawer in your chest of drawers one by one until you find one that has socks, versus remembering that $drawer{'socks'} is 'topleft' and just opening the top left drawer. Like most analogies, it's not perfect, but describes the efficiency gain fairly well.

      foreach

      Your for loop looks mostly functional, but you don't need to use C-style constructs, and you certainly don't need to read the entire file into an array first (waste of memory, possibly significant). You also don't need a clumsy if/else; there's a fabulous example of how to specify a filehandle as an expression right in print.

      open my $orglist, '<', $ARGV[1] or die "Can't open $ARGV[1]: $!"; foreach my $item (<$orglist>) { chomp($item); print { exists $dellist{$item} ? $beepon : $beepoff } $item . "\n" +; } close $orglist;

      Of course, the above assumes you've already done away with the bareword filehandles in favour of scalars.

        Ok so here is my next attempt... but I'm having trouble.

        #!/usr/bin/perl use strict; use warnings; #use diagnostics; if (@ARGV = 2){ my %delList; open (my $BEEPON, >>, "./beepon.file"); open (my $BEEPOFF, >>, "./beepoff.file"); open (my $DELLIST, <, $ARGV[0]); foreach (<$DELLIST>){ chomp($_); $delList{$_} = 1; } close $DELLIST; open (my $ORGLIST, <, $ARGV[1]); foreach my $entry (<$ORGLIST>){ chomp($entry); print { exists $delList{$entry} ? $BEEPON : $BEEPOFF } $entry +. "\n"; } close $BEEPON; close $BEEPOFF; }
        here are the errors I get when running the code:
        readline() on closed filehandle $DELLIST at ./delete_list_ver3.pl line + 12. Use of uninitialized value $ARGV[1] in concatenation (.) or string at +./delete_list_ver3.pl line 17. readline() on closed filehandle $ORGLIST at ./delete_list_ver3.pl line + 19. syntax error at ./delete_list_ver3.pl line 8, near ", >>" syntax error at ./delete_list_ver3.pl line 9, near ", >>" Unterminated <> operator at ./delete_list_ver3.pl line 10.

        ...

        grrr...

        how can a file I just opened be closed?

        ok so I reread your post. I made some changes to the way I open the file handles. I'm still getting one error.

        #!/usr/bin/perl use strict; use warnings; #use diagnostics; if (@ARGV = 2){ my %delList; open my $BEEPON, '>>', "./beepon.file" || die "Can't open beepon.f +ile $!"; open my $BEEPOFF, '>>', "./beepoff.file" || die "Can't open beepof +f.file $!"; open my $DELLIST, '<', $ARGV[0] || die "Can't open $ARGV[0] $!"; foreach (<$DELLIST>){ chomp($_); $delList{$_} = 1; } close $DELLIST; open my $ORGLIST, '<' , $ARGV[1] || die "Cant open $ARGV[1] $!"; foreach my $entry (<$ORGLIST>){ chomp($entry); print { exists $delList{$entry} ? $BEEPON : $BEEPOFF } $entry +. "\n"; } close $BEEPON; close $BEEPOFF; }

        here are the resulting errors:

        $./delete_list_ver3.pl test.list test_master.1 readline() on closed filehandle $DELLIST at ./delete_list_ver3.pl line + 12. Use of uninitialized value $ARGV[1] in concatenation (.) or string at +./delete_list_ver3.pl line 17. Cant open Bad file descriptor at ./delete_list_ver3.pl line 17.

        WTF???

        as far as the BeepOn Vs BeepOff are concerned... Not very descriptive taken out of context are they.

        this is part of a bigger project that generates a config file for an asterisk conference bridge. I often find it necessary to move ID's from one list to the other, and this is the result.

        honestly the boss is happy from the first iteration of this code; I'm not because I am interested in learning Perl, and learning how to use Perl effectively and efficiently, and this seems like a good learning project, (at least I've learned alot I think) and progressing my perl abilities will help me not only professionally but academically as well.

        For the record I really appreciate your help all along the way.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://844376]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2014-08-20 11:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (111 votes), past polls