http://www.perlmonks.org?node_id=990822

jemswira has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks.

So now I have 1 array, and 1 arraylist.

@tocheck : Apple Corn Pie Fish %checkfrom: Meat => Fish Apple Pork Bacon Fruit => Apple Pie Orange Beef

Well that's the gist of it at least. I'm trying to remove the values that are in the array @tocheck from the hash %checkfrom. So the end result is something like this:

Meat=> Pork Bacon Fruit => Orange Beef

Well but there are several thousand keys in the %checkfrom, and probably about the same number of elements in @tocheck. Is there anything faster than checking like this? Also how can I remove the value from the value key pair? I know delete $checkfrom{$_} deletes the entire key/value pair.

foreach $checking(@tocheck){ for (keys %checkfrom){ if ($checkfrom{$_} eq $checking){ #does }

Thanks Monks!

Update:

Well, since this example isnt working, I shall post my full code and files up here.

Activpospf.txt PF01486 PF00319 PF04947 PFACTest.txt PF01486 : C12345 C23456 PF00319 : C15234 PF12345 : C00001 C12345 PF98765 : C00000

These are just small files so I can test them out. The actual files are much bigger.

#!/usr/bin/perl use Modern::Perl; use File::Slurp qw/read_file write_file/; my $pfaminput ='Activpospf.txt'; my $seedinput ='PFACtest.txt'; #open POSITIVEOUT, ">", 'ActivPosdata.txt'; #open NEGATIVEOUT, ">", 'ActivNeg.txt'; my %seedin =map{chomp; /(.+)\s+\:\s+(.+)/;$1=>$2;} read_file $seedinp +ut; my $pfam; my @tocheck; my @negative; for $pfam(read_file $pfaminput){ chomp ($pfam); if (defined $seedin{$pfam}) { my @splitter =split(/ /, $seedin{$pfam}); push (@tocheck, @splitter); delete $seedin{$pfam};}}

What I intend to get is to remove the entire lines from PFACtest which have the PF values that are in Activpospf.txt, and also to use the numbers that are within those lines that are removed, and remove them from the rest of the file. i.e. this:

PF12345 : C00001 PF98765 : C00000

Thanks!

Replies are listed 'Best First'.
Re: Removal of values in array from array list
by ikegami (Patriarch) on Aug 30, 2012 at 19:12 UTC
    my %tocheck = map { $_ => 1 } @tocheck; for (values(%checkfrom)) { @$_ = grep !$tocheck{$_}, @$_; }

      When I do that I get the error message:"Cant use string ("...") as an ARRAY ref while strict refs in use.

      anyway I updated the post to give more insight into the question.

        You were very vague about your input format. I assumed you meant:

        my %checkfrom = ( Meat => [qw( Fish Apple Pork Bacon )], Fruit => [qw( Apple Pie Orange Beef )], );
Re: Removal of values in array from array list
by Limbic~Region (Chancellor) on Aug 30, 2012 at 19:17 UTC
    jemswira,
    Is there anything faster than checking like this?

    Yes. Or rather, maybe. It is hard to tell from your pseudo notation if you have a hash of arrays, a hash of hashes or just a hash with weird key/value pairs. Assuming you have a hash of hashes then yes.

    my @tocheck = qw/Apple Corn Pie Fish/; my %checkfrom = ( Meat => { Fish => 1, Apple => 1, Pork => 1, Bacon => 1, }, Fruit => { Apple => 1, Pie => 1, Orange => 1, Beef => 1, }, ); for my $thing (@tocheck) { for my $category (keys %checkfrom) { delete $checkfrom{$category}{$thing}; } }
    If your pseudo notation implied something else. Use real code and repost.

    Cheers - L~R

      Well it is a hash of arrays. It's just a small part of my program. The hash and array are both examples. It's actually protein numbers and stuff but the example still Holds.

      Reposted

Re: Removal of values in array from array list
by CountZero (Bishop) on Aug 30, 2012 at 19:26 UTC
    Assuming your values are one string of space delimited words, rather than an array(ref).

    use Modern::Perl; use Data::Dump qw/dump/; use Regexp::Assemble; my @tocheck = qw/Apple Corn Pie Fish/; my %checkfrom = ( Meat => 'Fish Apple Pork Bacon', Fruit => 'Apple Pie Orange Beef' ); my $ra = Regexp::Assemble->new; $ra->add( @tocheck ); for (values %checkfrom) { s/$ra//g; $_ = join ' ', split; # delete spurious spaces } say dump(\%checkfrom);

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Removal of values in array from array list
by philiprbrenan (Monk) on Aug 30, 2012 at 20:15 UTC

    There is a significant time difference between methods 1 and 2 which is presumably because of the inner loop encoded explicitly in perl in method (2) vs the implicit inner loop in method (1) actually implemented in C.

    use feature ":5.14"; use warnings FATAL => qw(all); use strict; use Time::HiRes qw(time); use Data::Dump qw(dump pp); my %checkFrom; my @toCheck; sub init0() {@toCheck = (); push @toCheck, rand(1e6) for 1..1e4; } sub init1() {%checkFrom = (); for my $x(0..999) {for my $y(0..999) {$checkFrom{$x}[$y] = 1e3*$x+$y; } } init0(); } sub init2() {%checkFrom = (); for my $x(0..999) {for my $y(0..999) {$checkFrom{$x}{$y} = 1e3*$x+$y; } } init0(); } for(1..10) {if (1) {init1(); my $s = time(); {my %toCheck = map { $_ => 1 } @toCheck; @$_ = grep !$toCheck{$_}, @$_ for values %checkFrom; } say "1 took ", (time() - $s); } if (2) {init2(); my $s = time(); {for my $thing (@toCheck) {for my $category (keys %checkFrom) {delete $checkFrom{$category}{$thing}; } } } say "2 took ", (time() - $s); } }

    Produces

    1 took 1.06206107139587
    2 took 11.8416769504547
    1 took 1.10106301307678
    2 took 12.3792278766632
    1 took 1.07640194892883
    2 took 12.4490790367126
    1 took 1.2350709438324
    2 took 12.7647299766541
    1 took 1.27407312393188
    2 took 12.8677358627319
    1 took 1.17906808853149
    2 took 12.6504328250885
    1 took 1.17000198364258
    2 took 12.8158860206604
    1 took 1.2890739440918
    2 took 12.8577361106873
    1 took 1.32807612419128
    2 took 12.7377278804779
    1 took 1.24107122421265
    2 took 12.7136778831482