Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Hi Perl Monks,

I am sorry for seeking the wisdom of Perl Monks very frequently. Hope they won't mind this matter. I am interested in finding out the percent match and the matched words between two sentences i.e. "Poet Blake had a milky white cat." and "Poet Blake had a white cat." using perl script as a part of my learning. Both the sentences differ in just one word i.e. "milky". I have used the script match.pl and the module "Text::Plagiarized" given below. But something has gone wrong in the script either in naming (assigning) the original and matching texts to variables at lines 9-10 or missing some code at line 11. I am sorry for uploading such a long script and a module. May I request the wisdom of the perl monks in my endeavour to find the desired results? Any suggestion from perl monks regarding related text materials for my further reading is always welcome.

The perl script match.pl that I have used goes like:

#!/usr/bin/perl-l.-l../perllib ## PERL CODE TO DETECT PER CENT MATCHES BETWEEN TWO TEXTS(Original & m +atching): ## Test files: ## Original text: Poet Blake had a milky white cat. ## Matching text: Poet Blake had a white cat. use warnings; use strict; ## Input: my $original_text="Poet Blake had a milky white cat."; my $new="Poet Blake had a white cat."; # Line 10 ## Line 11 ???????? ### Code starts here: my $text=Text::Plagiarized->new; # Line 13 $text->original($original_text); foreach my $comparison (my @comparison_texts) { $text->comparison($comparison); $text->analyze; print $text->percent, $/; # percent of matching sentences if ($text->percent > my $some_threshold) { # Line 19 [my $sentence,my $possible_match] print Dumper($text->matches); } } exit; # Line 24

The perl module Text::Plagiarized goes like:

package Text::Plagiarized; $REVISION = '$Id: Plagiarized.pm,v 1.0 2003/07/13 19:15:57 ovid Exp $' +; $VERSION = '0.01'; use 5.006; use strict; use warnings; use String::Approx qw/amatch/; use Text::Sentence qw/split_sentences/; sub new { my $class = shift; my $self = bless { original => {}, comparison => {}, matches => [], threshold => 80, } => $class; } sub original { my ($self, $text) = @_; local $_ = $text; $self->{original} = { text=> $text, sentences=> [split_sentences($text)], }; return $self; } sub comparison { my ($self, $text)=@_; local $_ = $text; $self->{comparison} = { text=> $text, sentences=> [split_sentences($text)], }; return $self; } my %percentage = map { $_ => 1 } 0 .. 100; # wow. This is a cheap hac +k sub threshold { my $self = shift; if (@_) { my $num = shift; unless (exists $percentage{$num}) { require Carp; Carp::croak("threshold must be an integer between 0 and 10 +0, inclusive"); } $self->{threshold} = 100 - $num; } $self->{threshold}; } sub analyze { my $self= shift; my @sentences; my $threshold= $self->threshold; foreach my $sentence1 (@{$self->{original}{sentences}}) { foreach my $sentence2 (@{$self->{comparison}{sentences}}) { my ($hash1, $hash2) = _hash($sentence1, $sentence2); if ($hash1 eq $hash2 || amatch($hash1, ["$threshold%"], $h +ash2)) { push @sentences => [$sentence1 => $sentence2]; last; } } } $self->{matches}= \@sentences; } sub matches { shift->{matches} } sub percent { my $self= shift; my $precision = shift || 0; my $matches= @{$self->matches}; my $sentences= @{$self->{original}{sentences}}; sprintf "%.${precision}f" => ($matches/$sentences)*100; } # starts to break down if we have more than 26 different words # use Unicode characters? # stop words? # memoize this sub _hash { my @string = map lc $_ =>@_; s/[^[:alnum:][:space:]]//g foreach @string; s/[[:space:]]+/ /g foreach @string; my %words; my $letter = 'a'; s/(\S+)/ unless (exists $words{$1}) { $words{$1}=$letter; $letter++; } $words{$1}/eg foreach @string; s/ //g foreach @string; return @string; } 1;

The results of the command prompt window show errors as follows:

C:\Users\x>cd desktop C:\Users\x\Desktop>match.pl syntax error at C:\Users\x\Desktop\match.pl line 21, near "print" Execution of C:\Users\x\Desktop\match.pl aborted due to compilation er +rors. C:\Users\x\Desktop>

In reply to How do I change the code to get percent match and matched words between two texts? by supriyoch_2008

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (5)
    As of 2014-04-19 20:40 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      April first is:







      Results (483 votes), past polls