Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Parsing SGML-ish Data Files

by coolmichael (Deacon)
on Aug 15, 2012 at 19:27 UTC ( #987622=note: print w/ replies, xml ) Need Help??


in reply to Re: Parsing SGML-ish Data Files
in thread Parsing SGML-ish Data Files

Well, the end goal is converting to XML. Regular expressions for the conversion aren't going to work very well, as the tags aren't properly nested as they are in XML/HTML/SGML. For example <a><b></a></b> is considered valid.

I do think the speed problem is in the tokenizer. I am doing to the scan one character at a time (from a buffer in memory, at least). I'm not sure how I could do that with regular expressions, but it's a good idea to look into.


Comment on Re^2: Parsing SGML-ish Data Files
Download Code
Re^3: Parsing SGML-ish Data Files
by GrandFather (Cardinal) on Aug 16, 2012 at 03:29 UTC

    If you can show us enough of the actual structure of the data and describe the constraints on tags, attributes etc, we should be able to at least sketch a regex based solution or offer other alternatives for you.

    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://987622]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2014-12-28 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls