Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Need to get the intersect of hashes

by jbush82 (Novice)
on May 15, 2008 at 07:11 UTC ( [id://686670]=perlquestion: print w/replies, xml ) Need Help??

jbush82 has asked for the wisdom of the Perl Monks concerning the following question:

I've finally had some time to sit down and work on an MD5 scanner script which I posted last week (http://www.perlmonks.org/?node_id=685695), and I'm trying to use suggestions that were given to me to go through it and write it correctly.

I now have two hashes, one containing a list of known bad filenames and their associated MD5 values, and the other containing a list of all files on a given system and their paths (note that both hashes can have multiple values per key).

What I need to do now, is get the intersect of the keys of each hash so that I have a list of files from the system that match the list of files that are known to be bad. My confusion is how correctly get the intersect so that I can easily use the values of each key from both hashes(because I'll need the paths and the known bad md5 values to do the actual MD5 check).

Any suggestions or direction is appreciated.
#!C:\perl\bin\perl.exe -w use strict; my @known_bad; #eac +h element is a line within the knownbad.txt file open(FILE, "knownbad.txt") or die("Unable to open file"); @known_bad = <FILE>; close(FILE); my $bad_data; my $bad_file; my $bad_md5; my $bad_file_array_element; my %bad_file_md5; foreach $bad_data (@known_bad) { + #take data from knownbad.txt file and parse it into a hash chomp($bad_data); ($bad_file, $bad_md5) = split(/\,/, $bad_data); push(@{ $bad_file_md5{"$bad_file"} }, "$bad_md5"); } my $system_file_location; my $system_file; my %system_file_data; #open FILES, "psexec.exe -n 2 \\\\192.168.1.10 cmd.exe \/C dir C\:\\ \ +/S \/B |" or die; open FILES, "cmd.exe \/C dir C\:\\ \/S \/B |" or die; + #take data from directory listing and parse it into a +hash while ( <FILES> ) { ( $system_file_location, $system_file ) = m/(.*)[\\\/](.+)/ ? +( $1, $2 ) : ( undef, $_ ); # print "$system_file is in the directory $system_file_location +\n"; push(@{ $system_file_data{"$system_file"} }, "$system_file_loc +ation"); } close FILES;

$VAR1 = { 'arbies.dll' => [ '388B8FBC36A8558587AFC90FB23A3B99' ], 'psexec.exe' => [ '78A2C9D79C21DDFCB7CED32F5EBEC618', '388B8FBC36A8558587AFC90FB23A3B99' ], 'notepad.exe' => [ '388B8FBC36A8558587AFC90FB23A3B99' ], 'angelfood.txt' => [ '388B8FBC36A8558587AFC90FB23A3B99' ] };

Replies are listed 'Best First'.
Re: Need to get the intersect of hashes
by grizzley (Chaplain) on May 15, 2008 at 07:39 UTC
    Isn't it that you get every key from one hash and check if it exists in the other hash?
    @keys_existing_in_both = grep defined $hash2{$_}, keys %hash1;
      Yes, that does give me the intersect of the keys. What I need to do is take each key in the intersect array (@keys_existing_in_both in your example) and act on each value associated with each key. Thats where I'm confused.

      For example, lets say the the array in your example has the element psexec.exe. What I need to do is search the second hash (the one containing the system files) for psexec and then run a system command on each value associated with the psexec.exe key. Take the results of that data (the md5 of the file) and compare it to the other values ins the psexec.exe key in the first hash (known bad data).
        for my $k (@keys_existing_in_both) { my $exec = $hash2{$k}; # do something with $k my $result = md5($k); if ($result ne $hash2{$k}){ print "Hash sum miss match for '$k'!\n"; } }
        (BTW in the general case exists $hash{$key} checks if an key exists in a hash, not defined $hash{$key}.)
        You don't need to search in hash. If you have a key, you just retrieve the value connected with key. Can you print both structures, which you have, with help of Data::Dumper and append to your question?
Re: Need to get the intersect of hashes
by pc88mxer (Vicar) on May 15, 2008 at 15:48 UTC
    It seems to me that a better way to go about this would be to use the MD5 signatures as the keys of your hash instead of the file names. The issue is that if a file is bad (by which I presume you mean contains a virus), then you wold want to know about it regardless of what it was named.

    Your search would then go like this:

    my %bad_file; for each bad file: $bad_file{ md5 of bad file } = 1; for each system file: my $md5 = md5 of system file if ($bad_file{$md5}) { report this system file }
    There is the possibility of getting some false positives, but that's better than not reporting hits simply because the file names don't agree.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://686670]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-19 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found