One way to strip out the "-" characters is like this:
#!usr/bin/perl
use strict;
use warnings;
foreach my $ssn qw(123-45-6789
987654321)
{
my $digits = $ssn;
$digits =~ s/-//g;
print "$ssn \t$digits\n";
}
__END__
prints:
123-45-6789 123456789
987654321 987654321
I am not sure of the best way to handle this "sometimes field 2 vs 4" without seeing a few example lines of these databases. Don't post any real SSNs!
As mentioned before, your HUGE performance gain will come by processing each of the 2 files only once. Process file 2 first to make a memory structure, then process file 1 line by line. Each file only should be read once.