Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Help Improve/troubleshoot Simple Lexicon Perl Code

by clothespeg (Novice)
on Jul 10, 2019 at 03:42 UTC ( #11102621=perlquestion: print w/replies, xml ) Need Help??

clothespeg has asked for the wisdom of the Perl Monks concerning the following question:

Hello! The lexicon is a txt space separated list of two columns $POS and $Lemma. I am trying to write a perl code to look for the instances of $Lemma in a text, count them, show the number of POS found. In other words the output I am looking for should look like:

n 500
v 1200

I am a beginner and have written the following but not sure what is the problem or how to get it to work.
open(S, "Lexicon.txt"); while($sa=<S>){ ($POS, $Lemma) = split('\t',$sa); $emotive{$Lemma}=$POS}; open(File, 'eng.txt'); while($text = <File>){ #this is some preprocessing for the text $text =~ s/\s+/ /g; $text =~ s/,/ ,/g; $text =~ s// /g; $text =~ s/\?/ !/g; #$text =~ tr/A-Z/a=z/; @words = split(' ', $text); for $word(@words){ if(exists $emotive{$word}){ print $word }}};
Thanks in advance :)

Replies are listed 'Best First'.
Re: Help Improve/troubleshoot Simple Lexicon Perl Code
by GrandFather (Sage) on Jul 10, 2019 at 06:11 UTC

    You can use internal data to make it easier for you an for us to test the code and compare results. Until you give us some data to work with there's not much we can do to help debug your problem. There are some issues however:

    With changes implied by the comments above your code could look like:

    use strict; use warnings; my $lexicon = <<LEX; 10 the 5 quick LEX my $text = <<TEXT; the quick brown fox jumps over the lazy dog TEXT open my $lexIn, '<', \$lexicon; my %emotive; while (my $sa = <$lexIn>) { my ($POS, $Lemma) = split('\s',$sa); $emotive{$Lemma}=$POS } open my $inFile, '<', \$text; while (my $text = <$inFile>) { #this is some preprocessing for the text $text =~ s/\s+/ /g; $text =~ s/,/ ,/g; $text =~ s// /g; $text =~ s/\?/ !/g; #$text =~ tr/A-Z/a=z/; my @words = split(' ', $text); for my $word(@words) { if (exists $emotive{$word}) { print $word } } }

    Prints:

    thequickthe

    Maybe you can tell us where you are having a problem because aside from the coding issues mentioned above there is a lot going on in that script that implies competence beyond your stated competence.

    Update: fix documentation link to open function.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Help Improve/troubleshoot Simple Lexicon Perl Code
by choroba (Bishop) on Jul 10, 2019 at 08:59 UTC
    I used the following lexicon:
    d the a quick a brown n fox v jumps p over d the a lazy n dog

    And the following input:

    the quick brown fox jumps over the lazy dog

    Running the following script with these two files as arguments

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my ($lexicon, $input) = @ARGV; my %emotive; open my $lex, '<', $lexicon or die $!; while (<$lex>) { my ($pos, $lemma) = split; warn "Duplicate POS for $lemma.\n" if exists $emotive{$lemma} && $emotive{$lemma} ne $pos; $emotive{$lemma} = $pos; } my %seen; open my $in, '<', $input or die $!; while (<$in>) { $seen{$_}++ for map $emotive{$_}, split; } for my $pos (sort { $seen{$a} <=> $seen{$b} } keys %seen) { say "$pos\t$seen{$pos}"; }

    the output is

    p 1 v 1 n 2 d 2 a 3

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Help Improve/troubleshoot Simple Lexicon Perl Code (updated)
by AnomalousMonk (Chancellor) on Jul 10, 2019 at 12:04 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11102621]
Approved by Athanasius
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2019-07-19 14:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?