Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Memory Leak when using Lingua::EN::MatchNames

by allolex (Curate)
on May 04, 2004 at 21:41 UTC ( #350587=note: print w/replies, xml ) Need Help??

in reply to Memory Leak when using Lingua::EN::MatchNames

I've tried to improve your code, and I have tested what I am posting here.

You have a couple of problems. The first one is that you are creating a number of unnecessary arrays with your extra foreach loops. I've tried to optimize that by using keys with hashes as opposed to creating temporary lookup arrays, and then creating a subroutine to handle reading in the files, etc. When you're done with a file, it is a good idea to close it to avoid leaving dangling filehandles.

The other thing has already been pointed out. If you look at the Lingua::EN::MatchNames documentation, you can see it expects fn1, ln1, fn1, ln2. I fixed that, kind of (see my comment). The name_eq() function will return undef if there is no possible match, so I adding handling for that as well. That accounts for the "uninitialized value" warnings.

#!/usr/bin/perl use strict; use warnings; use Lingua::EN::MatchNames; my $termfile = shift; my $userfile = shift || die "Usage: $0 TERMFILE USERFILE\n"; my %curlookup = getlist($userfile); my %termlookup = getlist($termfile); open my $dfh, ">", "dup.$userfile"); foreach my $termusername (keys %termlookup) { NameComp( $termlookup{$termusername} ) } close $dfh; # getlist takes a filename as an argument sub getlist { my $filename = shift; my $counter; my %results; open my $fh, "<", "$filename"; while ( <$fh> ) { chomp; ++$counter; next unless m/[A-Za-z]/; $results{$counter} = $_; print "Adding user $counter $_ from file \'$filename\' to hash +\n"; } close $fh; return %results; } sub NameComp { # no parens, this is not a function prototype my $compname = shift; foreach my $curusername (keys %curlookup) { print "Comparing \'$compname\' to \'$curlookup{$curusername}\' +\n"; # This method is not good because it assumes a ' FN -SPACE- LN + ' format my @compname = split /\s+/, $compname; my @curname = split /\s+/, $curlookup{$curusername}; my $name_score = name_eq( $compname[0], $compname[1], +$curname[0], $curname[1] ); if ( $name_score ) { if ( $name_score >= 80 ) { print "Found Match $curlookup{$curusername} with a sco +re of $name_score.\n\n"; } } else { print "\t\tNo possible match.\n\n"; } } }

Best of luck to you.

Damon Allen Davison

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://350587]
[Corion]: Oh yay. $project has faffed around for 2 years and now pushed a hasty 2 weeks solution into production without telling me. Of course, the new data also needs changes on my side to be processed correctly. Now they try to raise incidents against my ...
[Corion]: ... input systems to make the change outside of the organized project because I don't have ressources for the project to make the change.

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2017-08-17 09:12 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (285 votes). Check out past polls.