in reply to
Re: Parsing SGML-ish Data Files
in thread Parsing SGML-ish Data Files
Well, the end goal is converting to XML. Regular expressions for the conversion aren't going to work very well, as the tags aren't properly nested as they are in XML/HTML/SGML. For example <a><b></a></b> is considered valid.
I do think the speed problem is in the tokenizer. I am doing to the scan one character at a time (from a buffer in memory, at least). I'm not sure how I could do that with regular expressions, but it's a good idea to look into.