Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re^2: Parsing SGML-ish Data Files

by coolmichael (Deacon)
on Aug 15, 2012 at 19:27 UTC ( #987622=note: print w/replies, xml ) Need Help??

in reply to Re: Parsing SGML-ish Data Files
in thread Parsing SGML-ish Data Files

Well, the end goal is converting to XML. Regular expressions for the conversion aren't going to work very well, as the tags aren't properly nested as they are in XML/HTML/SGML. For example <a><b></a></b> is considered valid.

I do think the speed problem is in the tokenizer. I am doing to the scan one character at a time (from a buffer in memory, at least). I'm not sure how I could do that with regular expressions, but it's a good idea to look into.

Replies are listed 'Best First'.
Re^3: Parsing SGML-ish Data Files
by GrandFather (Sage) on Aug 16, 2012 at 03:29 UTC

    If you can show us enough of the actual structure of the data and describe the constraints on tags, attributes etc, we should be able to at least sketch a regex based solution or offer other alternatives for you.

    True laziness is hard work

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://987622]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2016-10-26 04:14 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (333 votes). Check out past polls.