Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Best way to print variables in regex

by Mordan (Initiate)
on Jan 20, 2014 at 22:54 UTC ( #1071391=perlquestion: print w/replies, xml ) Need Help??

Mordan has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I am trying to get some expressions in the regex to print. I asked a related question to this and made a bit of a hash so starting afresh.

I am using tagger and want to be able to display the tags found in the regex.

$NUM = get_exp('cd'); $GER = get_exp('vbg'); $ADJ = get_exp('jj[rs]*'); $PART = get_exp('vbn'); $NN = get_exp('nn[sp]*'); $NNP = get_exp('nnp'); $PREP = get_exp('in'); $DET = get_exp('det'); $PAREN= get_exp('[lr]rb'); $QUOT = get_exp('ppr'); $SEN = get_exp('pp'); $WORD = get_exp('\p{IsWord}+');

I can display the text I input all tagged (code below), but what I want to do is display the count of tags. So like:

  • CD : 2
  • VBG: 5
  • etc

This code will output tagged text, but I can't seem to get it to tabulate the tags. My efforts, such as print $tag, print $GER and so on won't work.

Also I heard that tagger has problems accepting input from files rather than text in the coding, anyone else heard that?

#!/usr/bin/env perl use Lingua::EN::Tagger qw(add_tags); my $postagger = new Lingua::EN::Tagger; my $text = "the quick brown fox jumped over the lazy dog"; my $tagged = $postagger->add_tags($text); print $tagged, "\n";

Replies are listed 'Best First'.
Re: Best way to print variables in regex
by Kenosis (Priest) on Jan 20, 2014 at 23:22 UTC

    Perhaps the following will be helpful:

    use strict; use warnings; use Lingua::EN::Tagger qw(add_tags); my %tags; my $postagger = new Lingua::EN::Tagger; my $text = "the quick brown fox jumped over the lazy dog"; my $tagged = $postagger->add_tags($text); print $tagged, "\n\n"; $tags{ uc $1 }++ while $tagged =~ m!<([^/]+?)>!g; print "$_: $tags{$_}\n" for sort keys %tags;

    Output:

    <det>the</det> <jj>quick</jj> <jj>brown</jj> <nn>fox</nn> <vbd>jumped< +/vbd> <in>over</in> <det>the</det> <jj>lazy</jj> <nn>dog</nn> DET: 2 IN: 1 JJ: 3 NN: 2 VBD: 1

      Thank you Kenosis, your method seems the most straightforward. Thanks everyone who answered here and on the other thread.

      Are there any recommendations on how best to put this into a spreadsheet? I want to run this on a few phrases so think it would be a good idea to put it in a spreadsheet in a consistent way rather than copy and paste from terminal. So DET would values would always be in column 1, IN in 2.

        One way would be to create a CSV file and then import that into your spreadsheet:
        my $filename = '/path/to/file.csv'; open (my $fh, '>', $filename) or die "Could not open $filename, $!"; my @headers = qw( DET IN JJ NN VBD ); print $fh join(',',@headers) . "\n"; # then, for each of your phrases print $fh join(',', map($tags{$_} || 0, @headers) ) . "\n"; close $fh;
        However, if you intend to do the tagging at different times you will need a way to update the data. You could use Spreadsheet::WriteExcel but there is a learning curve and probably overkill. Alternatively, you can keep your spreadsheet data as a CSV file and append to that file, or use Tie::Array::CSV to append:
        use Tie::Array::CSV; my $filename = '/path/to/file.csv'; tie my @file, 'Tie::Array::CSV', $filename; # (this bit has been fixed - see comment below) # for each of your phrases my @row = map { $tags{$_} || 0 } @headers; push(@file,\@row); untie @file;
Re: Best way to print variables in regex
by jethro (Monsignor) on Jan 20, 2014 at 23:24 UTC

    Where does get_exp come from, it isn't mentioned in tagger's documentation?

    Just looked at the documentation and it seems there are methods that return hashes with occurrence frequencies ready to use. Especially get_nouns and get_proper_nouns seem to offer just what you want:

    "get_proper_nouns TAGGED_TEXT

    Given a POS-tagged text, this method returns a hash of all proper nouns and their occurrence frequencies...."

Re: Best way to print variables in regex
by tangent (Vicar) on Jan 21, 2014 at 00:00 UTC
    I have posted a reply to your other question which may help here as well (update: ignore this - Kenosis solution above is far more elegant).
Re: Best way to print variables in regex
by Anonymous Monk on Jan 21, 2014 at 02:48 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1071391]
Approved by johngg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2022-06-25 17:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (83 votes). Check out past polls.

    Notices?