Re^2: Parsing SGML-ish Data Files

by coolmichael (Deacon)
on Aug 15, 2012 at 19:27 UTC

in reply to Re: Parsing SGML-ish Data Files
in thread Parsing SGML-ish Data Files

Well, the end goal is converting to XML. Regular expressions for the conversion aren't going to work very well, as the tags aren't properly nested as they are in XML/HTML/SGML. For example <a><b></a></b> is considered valid.

I do think the speed problem is in the tokenizer. I am doing to the scan one character at a time (from a buffer in memory, at least). I'm not sure how I could do that with regular expressions, but it's a good idea to look into.

Replies are listed 'Best First'.
Re^3: Parsing SGML-ish Data Files
by GrandFather (Saint) on Aug 16, 2012 at 03:29 UTC

    If you can show us enough of the actual structure of the data and describe the constraints on tags, attributes etc, we should be able to at least sketch a regex based solution or offer other alternatives for you.

Node Type: note [id://987622]
