Re^3: Best way to search file

Feel free to reach out, but I doubt that you will have any trouble with it, once you’ve studied the previous example. (If you do, don’t waste your own time: ask.)

Also: when you load the data into your hash, you should not take for granted that there is not an error in your input-file. As you load the hash, I would recommend that you test to see if the key already exists() in the hash, and die() if it does. “Trust, but verify.”

The data volumes that you indicate certainly seem to be appropriate for the use of a hash, and that’s the way I would pursue it.

Comment on Re^3: Best way to search file

Replies are listed 'Best First'.
Re^4: Best way to search file by insta.gator (Novice) on Apr 16, 2015 at 18:19 UTC
Thanks Sundial. A couple of questions. I was able to get the hash created and working properly. Now I need to take care of some details. Depending on the type of file that I am using, the SSN may or may not have hyphens in it. How would you strip the hyphens while loading the hash? This is what I have now: `while (<$HRDATA>) { my ($ssn,$aoid) = split(/","/)[4,2]; $ssnhash{$ssn} = $aoid; }` [download] Basic I am sure but I am just learning. Secondly, again, depending on file type, the SSN may be in field 2 or 4 of file 2. One file type, where the SSN is in field 2 has a file header at the top. The only way that I can see to programatically know which is which it to query the file line of the file. Once I know that I can tweak my code to load the SSN in the hash from the proper fields. Does that make sense? Any thoughts on a better way? Thanks!!	[reply] [d/l]
Re^5: Best way to search file by Marshall (Canon) on Apr 16, 2015 at 21:02 UTC
One way to strip out the "-" characters is like this: `#!usr/bin/perl use strict; use warnings; foreach my $ssn qw(123-45-6789 987654321) { my $digits = $ssn; $digits =~ s/-//g; print "$ssn \t$digits\n"; } __END__ prints: 123-45-6789 123456789 987654321 987654321` [download] I am not sure of the best way to handle this "sometimes field 2 vs 4" without seeing a few example lines of these databases. Don't post any real SSNs! As mentioned before, your HUGE performance gain will come by processing each of the 2 files only once. Process file 2 first to make a memory structure, then process file 1 line by line. Each file only should be read once.	[reply] [d/l]


No such thing as a small change
	PerlMonks