http://www.perlmonks.org?node_id=494645


in reply to substr(ingifying) htmlized text

Is it reasonable to assume your question is more to do with - can we fix a HTML file that is not well-formed?

I am not sure if that is possible. Too many things to worry about.

take this for example -

<HTML> <HEAD> </BODY> </HEAD> </HTML>
Now </BODY> will close <HEAD>. Well you scan backwards and forwards and pick the one which gives the valid HTML but i can surely come up with two errors that will make the program think it is a valid HTML. Unless you are going to check with Keywords it is going to hard to do this. Even if you check with keywords when someone misses the tag where will you put them?

Sorry not much of help on the code front but just listing out issues. -SK