Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Is it possible to find the matching words and the percentage of matching words between two texts?

by McDarren (Abbot)
on Dec 21, 2012 at 08:48 UTC ( #1009889=note: print w/ replies, xml ) Need Help??


in reply to Is it possible to find the matching words and the percentage of matching words between two texts?

A simple approach would be to build two hashes from the strings, and then compare the hashes.

So you might do something like:
my %foo; my $string = 'Poet Blake had a milky white cat. He used to call it Pus +sy.'; for my $word (split /\s+/, $string) { $foo{$word}++; }
You do the same for the second string, and then to compare you simply iterate through one of the hashes and increment a counter if each word is present in the other hash. Something like so:
my $cnt; for my $word (keys %foo) { $cnt++ if $bar{$word}; }

To find the total number of words in either string, you simply count the number of keys in the hash, e.g.

my $word_count = scalar keys %foo;

And then it's just a simple calculation.
Obvious question is how does your calculation look if the two strings contain a different number of words? But I'm sure you can decide that.

hope this helps,
Darren


Comment on Re: Is it possible to find the matching words and the percentage of matching words between two texts?
Select or Download Code
Replies are listed 'Best First'.
Re^2: Is it possible to find the matching words and the percentage of matching words between two texts?
by supriyoch_2008 (Scribe) on Dec 21, 2012 at 10:34 UTC

    Hi McDarren,

    Thanks for your prompt reply. I shall try to solve my problem using the codes given by you.

    Regards

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1009889]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2015-07-31 10:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (276 votes), past polls