Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Re: Re: Parse::RecDescent Grammar Fun

by hsmyers (Canon)
on Jul 24, 2002 at 22:51 UTC ( #185058=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Parse::RecDescent Grammar Fun
in thread Parse::RecDescent Grammar Fun

This may not be all that robust, but it is better than square one...
#!/usr/bin/perl -w use strict; use Parse::RecDescent; my $grammar = join( '', <DATA> ); my $parser = Parse::RecDescent->new( $grammar ) or die "Error: Bad grammar\n"; my $text =<< "SUB_STDIN"; A nicely [[spaced]] link. A poorly[[spaced]]link. Another poorly..[[spaced]] A harder problem [spaced] link. Yet harder still..]]spaced[[ The real problem is [[spaced]] followed by ]] or [[ link. SUB_STDIN my $results = $parser->startrule( $text ) or die "Error: Bad text\n"; __DATA__ startrule: <skip:''> bit(s) bit: eol | word | space | token | punct eol: /\n[ \t]*/ {print "<newline>\n" } space: /[ \t]+/ {print "< >" } word: /[\w\']+/ {print "<word: $item[1]>" } punct: /[^\w\s\[\]]+/ {print "<punct: $item[1]>" } | /(?<!\[)\[(?!\[)/ {print "<punct: $item[1]>" } | /(?<!\])\](?!\])/ {print "<punct: $item[1]>" } | /(?<!\[\[)\]\]/ {print "<punct: $item[1]>" } | /(?<!\]\])\[\[/ {print "<punct: $item[1]>" } token: link link: /\[\[(.+?)\]\]/ {print "<link: $item[1]>" }
Notice: new test cases for bracket as 'punct'.


"Never try to teach a pig to wastes your time and it annoys the pig."

Replies are listed 'Best First'.
Re: Re: Re: Re: Parse::RecDescent Grammar Fun
by ichimunki (Priest) on Jul 25, 2002 at 00:52 UTC
    Hahaha! We're going to beat this grammar into submission yet. :)

    Unfortunately we can't brute force it, think of the labor and testing involved to add new tags. I think the best I can do here is to collect punct as single character chunks (storing them in a temp var), then, when I get to a token, insert that temp var back into the tree. I'd post code, but rather than printing discrete sensible morphemes, I really just need the morphemes (productions in P::RD-speak) concatenated in a string. For that purpose whether it emits punct one character at a time or in chunks won't matter.

    Either way, this Parse::RecDescent module is the best thing since HTML::TokeParser[::Simple], imho.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://185058]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2020-12-05 21:22 GMT
Find Nodes?
    Voting Booth?
    How often do you use taint mode?

    Results (65 votes). Check out past polls.