Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re^2: Parsing SGML-ish Data Files

by coolmichael (Deacon)
on Aug 15, 2012 at 19:27 UTC ( #987622=note: print w/replies, xml ) Need Help??

in reply to Re: Parsing SGML-ish Data Files
in thread Parsing SGML-ish Data Files

Well, the end goal is converting to XML. Regular expressions for the conversion aren't going to work very well, as the tags aren't properly nested as they are in XML/HTML/SGML. For example <a><b></a></b> is considered valid.

I do think the speed problem is in the tokenizer. I am doing to the scan one character at a time (from a buffer in memory, at least). I'm not sure how I could do that with regular expressions, but it's a good idea to look into.

Replies are listed 'Best First'.
Re^3: Parsing SGML-ish Data Files
by GrandFather (Saint) on Aug 16, 2012 at 03:29 UTC

    If you can show us enough of the actual structure of the data and describe the constraints on tags, attributes etc, we should be able to at least sketch a regex based solution or offer other alternatives for you.

    True laziness is hard work

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://987622]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2021-10-27 10:02 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (91 votes). Check out past polls.