Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

How do I change the code to get percent match and matched words between two texts?

by supriyoch_2008 (Scribe)
on Sep 28, 2012 at 08:54 UTC ( #996144=perlquestion: print w/ replies, xml ) Need Help??
supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks,

I am sorry for seeking the wisdom of Perl Monks very frequently. Hope they won't mind this matter. I am interested in finding out the percent match and the matched words between two sentences i.e. "Poet Blake had a milky white cat." and "Poet Blake had a white cat." using perl script as a part of my learning. Both the sentences differ in just one word i.e. "milky". I have used the script match.pl and the module "Text::Plagiarized" given below. But something has gone wrong in the script either in naming (assigning) the original and matching texts to variables at lines 9-10 or missing some code at line 11. I am sorry for uploading such a long script and a module. May I request the wisdom of the perl monks in my endeavour to find the desired results? Any suggestion from perl monks regarding related text materials for my further reading is always welcome.

The perl script match.pl that I have used goes like:

#!/usr/bin/perl-l.-l../perllib ## PERL CODE TO DETECT PER CENT MATCHES BETWEEN TWO TEXTS(Original & m +atching): ## Test files: ## Original text: Poet Blake had a milky white cat. ## Matching text: Poet Blake had a white cat. use warnings; use strict; ## Input: my $original_text="Poet Blake had a milky white cat."; my $new="Poet Blake had a white cat."; # Line 10 ## Line 11 ???????? ### Code starts here: my $text=Text::Plagiarized->new; # Line 13 $text->original($original_text); foreach my $comparison (my @comparison_texts) { $text->comparison($comparison); $text->analyze; print $text->percent, $/; # percent of matching sentences if ($text->percent > my $some_threshold) { # Line 19 [my $sentence,my $possible_match] print Dumper($text->matches); } } exit; # Line 24

The perl module Text::Plagiarized goes like:

package Text::Plagiarized; $REVISION = '$Id: Plagiarized.pm,v 1.0 2003/07/13 19:15:57 ovid Exp $' +; $VERSION = '0.01'; use 5.006; use strict; use warnings; use String::Approx qw/amatch/; use Text::Sentence qw/split_sentences/; sub new { my $class = shift; my $self = bless { original => {}, comparison => {}, matches => [], threshold => 80, } => $class; } sub original { my ($self, $text) = @_; local $_ = $text; $self->{original} = { text=> $text, sentences=> [split_sentences($text)], }; return $self; } sub comparison { my ($self, $text)=@_; local $_ = $text; $self->{comparison} = { text=> $text, sentences=> [split_sentences($text)], }; return $self; } my %percentage = map { $_ => 1 } 0 .. 100; # wow. This is a cheap hac +k sub threshold { my $self = shift; if (@_) { my $num = shift; unless (exists $percentage{$num}) { require Carp; Carp::croak("threshold must be an integer between 0 and 10 +0, inclusive"); } $self->{threshold} = 100 - $num; } $self->{threshold}; } sub analyze { my $self= shift; my @sentences; my $threshold= $self->threshold; foreach my $sentence1 (@{$self->{original}{sentences}}) { foreach my $sentence2 (@{$self->{comparison}{sentences}}) { my ($hash1, $hash2) = _hash($sentence1, $sentence2); if ($hash1 eq $hash2 || amatch($hash1, ["$threshold%"], $h +ash2)) { push @sentences => [$sentence1 => $sentence2]; last; } } } $self->{matches}= \@sentences; } sub matches { shift->{matches} } sub percent { my $self= shift; my $precision = shift || 0; my $matches= @{$self->matches}; my $sentences= @{$self->{original}{sentences}}; sprintf "%.${precision}f" => ($matches/$sentences)*100; } # starts to break down if we have more than 26 different words # use Unicode characters? # stop words? # memoize this sub _hash { my @string = map lc $_ =>@_; s/[^[:alnum:][:space:]]//g foreach @string; s/[[:space:]]+/ /g foreach @string; my %words; my $letter = 'a'; s/(\S+)/ unless (exists $words{$1}) { $words{$1}=$letter; $letter++; } $words{$1}/eg foreach @string; s/ //g foreach @string; return @string; } 1;

The results of the command prompt window show errors as follows:

C:\Users\x>cd desktop C:\Users\x\Desktop>match.pl syntax error at C:\Users\x\Desktop\match.pl line 21, near "print" Execution of C:\Users\x\Desktop\match.pl aborted due to compilation er +rors. C:\Users\x\Desktop>

Comment on How do I change the code to get percent match and matched words between two texts?
Select or Download Code
Re: How do I change the code to get percent match and matched words between two texts?
by Corion (Pope) on Sep 28, 2012 at 09:04 UTC

    Have you looked at line 21 (and 20) of your program?

    Please look at the two lines, and explain to us what the two lines should do. Also consider that by convention, most lines in Perl programs contain a single statement and should end with the statement separator, ";".

      Corion,

      Thanks for your comment. Line 20 i.e. my $sentence,my $possible_match should find the matched words between two sentences. I shall go through the texts suggested by Marto and try once again to fix it.

      Regards,

Re: How do I change the code to get percent match and matched words between two texts?
by marto (Chancellor) on Sep 28, 2012 at 09:07 UTC

    If you read your code you should see what the problem is, perl gives you a hint where to look, not that it's a long script. The line prior to your print statement is where the problem lies. Not ended properly, it doesn't do what you think it does either. Still ignoring previous advice I see.

      Marto,

      Thanks for your comment. I shall once again read those texts mentioned under previous advice and try to fix the the problem.

      With regards,

Re: How do I change the code to get percent match and matched words between two texts?
by kcott (Abbot) on Sep 28, 2012 at 09:32 UTC

    G'day supriyoch_2008,

    You get this message from Perl:

    syntax error at C:\Users\DR-SUPRIYO\Desktop\match.pl line 21, near "print"

    Then you write:

    "But something has gone wrong in the script either in naming (assigning) the original and matching texts to variables at lines 9-10 or missing some code at line 11."

    I looked at the print on line 21. It's part of this statement spanning lines 20-21:

    [my $sentence,my $possible_match] print Dumper($text->matches);

    I haven't looked at any other parts of your code because Perl clearly states where the problem lies.

    -- Ken

      kcott,

      Thanks for your comment. I shall look into the lines 20-21 once again. I am sorry that I didn't look at the lines 20-21 as indicated in cmd. I shall try once again to fix it.

      Regards,

Re: How do I change the code to get percent match and matched words between two texts?
by Mr. Muskrat (Abbot) on Sep 28, 2012 at 20:45 UTC

    Once you fix that compilation error, you'll want to fix line 15 as well.

         foreach my $comparison (my @comparison_texts) {

    You: "Hey, I want you to loop through the new (and empty) @comparison_texts array!"

    Perl: "Okay, that's a no-op. Enjoy your lack of results."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://996144]
Approved by johngg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (11)
As of 2014-09-02 10:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (21 votes), past polls