Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Re: Re: Parse::RecDescent Grammar Fun

by hsmyers (Canon)
on Jul 24, 2002 at 22:51 UTC ( #185058=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Parse::RecDescent Grammar Fun
in thread Parse::RecDescent Grammar Fun

This may not be all that robust, but it is better than square one...

#!/usr/bin/perl -w use strict; use Parse::RecDescent; my $grammar = join( '', <DATA> ); my $parser = Parse::RecDescent->new( $grammar ) or die "Error: Bad grammar\n"; my $text =<< "SUB_STDIN"; A nicely [[spaced]] link. A poorly[[spaced]]link. Another poorly..[[spaced]]..link. A harder problem [spaced] link. Yet harder still..]]spaced[[..link. The real problem is [[spaced]] followed by ]] or [[ link. SUB_STDIN my $results = $parser->startrule( $text ) or die "Error: Bad text\n"; __DATA__ startrule: <skip:''> bit(s) bit: eol | word | space | token | punct eol: /\n[ \t]*/ {print "<newline>\n" } space: /[ \t]+/ {print "< >" } word: /[\w\']+/ {print "<word: $item[1]>" } punct: /[^\w\s\[\]]+/ {print "<punct: $item[1]>" } | /(?<!\[)\[(?!\[)/ {print "<punct: $item[1]>" } | /(?<!\])\](?!\])/ {print "<punct: $item[1]>" } | /(?<!\[\[)\]\]/ {print "<punct: $item[1]>" } | /(?<!\]\])\[\[/ {print "<punct: $item[1]>" } token: link link: /\[\[(.+?)\]\]/ {print "<link: $item[1]>" }
Notice: new test cases for bracket as 'punct'.

--hsm

"Never try to teach a pig to sing...it wastes your time and it annoys the pig."


Comment on Re: Re: Re: Parse::RecDescent Grammar Fun
Download Code
Re: Re: Re: Re: Parse::RecDescent Grammar Fun
by ichimunki (Priest) on Jul 25, 2002 at 00:52 UTC
    Hahaha! We're going to beat this grammar into submission yet. :)

    Unfortunately we can't brute force it, think of the labor and testing involved to add new tags. I think the best I can do here is to collect punct as single character chunks (storing them in a temp var), then, when I get to a token, insert that temp var back into the tree. I'd post code, but rather than printing discrete sensible morphemes, I really just need the morphemes (productions in P::RD-speak) concatenated in a string. For that purpose whether it emits punct one character at a time or in chunks won't matter.

    Either way, this Parse::RecDescent module is the best thing since HTML::TokeParser[::Simple], imho.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://185058]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2014-09-21 23:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (176 votes), past polls