Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Search & Replace Parsing?

by Guildenstern (Deacon)
on Apr 18, 2001 at 22:04 UTC ( #73607=perlquestion: print w/replies, xml ) Need Help??

Guildenstern has asked for the wisdom of the Perl Monks concerning the following question:

I'm currently working on a small personal project that involves converting HTML and plain text to a semi-proprietary markup language. I have several tables of HTML tags that must be converted, and am trying to devise the most efficient way to perform this task without completely reinventing the wheel.

So far, I've looked at Parse::RecDescent and Template, but these don't seem to have the "search & replace" functionality that I need. RecDescent has no problems finding specific tags, but I have not been able to find out how to perform a replacement that affects the text passed in to the parser. It's simple enough to write a grammar rule that matches a <html> tag, but I don't know what to do in the action portion of the rule. If I act on @item, the changes aren't reflected in the original text.

Template Toolkit also seems a bit limited for what I'm doing. It obviously has the search and replace functionality, but I don't seem to be able to find any way to specifiy specific search and replace pairs.

I know that I can throw all of my search and replace pairs into a great big hash and loop, but this seems to be a far cry from the best answer.


Is there something in Parse::RecDescent or Template Toolkit that I'm missing? Is there some other module available that will let me do search and replace in the manner that I need? I did quite a bit of research on CPAN, but it's entirely possible that I missed something. And before anybody makes the statement, no there are no convertors or modules available for the markup language I'm using.

Guildenstern
Negaterd character class uber alles!

Replies are listed 'Best First'.
Re (tilly) 1: Search & Replace Parsing?
by tilly (Archbishop) on Apr 18, 2001 at 23:01 UTC
    Stop thinking of it as search and replace inline.

    Think of it as scan and output transformed text.

    For a picture of my thinking on this last summer, see Why I like functional programming. (Where I solve a problem almost exactly like what it sounds like you want.) My thinking has changed somewhat since then, but I have not taken time to implement my new ideas yet...

Re: Search & Replace Parsing?
by iguane (Beadle) on Apr 18, 2001 at 22:08 UTC
    why not seeing the HTML:parser on CPAN ?
    If i remenber, it parse html code and after you hust do a regex expression.
      It looks like HTML::Parser will do some of what I need, but many of the documents I will be converting have HTML escape sequences that it appears HTML::Parser doesn't support easily. I could cobble together a hybrid system, but I would like to perform the entire search and replace action at once, instead of a little bit this way, and a little bit that way.

      Guildenstern
      Negaterd character class uber alles!
(Boo) Re: Search & Replace Parsing?
by boo_radley (Parson) on Apr 18, 2001 at 23:37 UTC
    use Toolkit to go from HTML to XML.
    use A Really Great Stylesheet to go from XML to your PML.
    Depending on the complexity of your prop. markup, it could be that simple.
Re: Search & Replace Parsing?
by princepawn (Parson) on Apr 18, 2001 at 22:36 UTC
    So far, I've looked at Parse::RecDescent and Template, but these don't seem to have the "search & replace" functionality that I need. RecDescent has no problems finding specific tags, but I have not been able to find out how to perform a replacement that affects the text passed in to the parser. It's simple enough to write a grammar rule that matches a <html> tag, but I don't know what to do in the action portion of the rule. If I act on @item, the changes aren't reflected in the original text.
    Why don't you create an abstract tree representing your parse and then traverse it when parsing is through to generate your new syntax?

    Also, I like HTML::TokeParser much more than HTML::Parser. They do the same thing but I find it easier to think the way the former tool does.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://73607]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2020-10-21 14:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (217 votes). Check out past polls.

    Notices?