Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: RegEx for incorrectly closed HTML attribute?

by Abigail-II (Bishop)
on Nov 29, 2002 at 18:53 UTC ( [id://216575]=note: print w/replies, xml ) Need Help??


in reply to Re^2: RegEx for incorrectly closed HTML attribute?
in thread RegEx for incorrectly closed HTML attribute?

Two examples that will fail the regex:
<A HREF = link>FOO</A> <A HREF = "link"><!-- </a>-->FOO</A>

Abigail

Replies are listed 'Best First'.
Re^4: RegEx for incorrectly closed HTML attribute?
by LAI (Hermit) on Nov 29, 2002 at 19:07 UTC
    I know. The first is an example of illegal HTML (at least, illegal as of XHTML 1.0) and the second is an example of nesting, as I mentioned. In the application Cody Pendant is (writing|maintaining) I would personally accept those as acceptable exceptions: neither will screw up more than the poster's message. As I understood it, the biggest problem with leaving open-ended links or otherwise screwing up the HTML was that the rest of the page would be screwed up as well. These two will get rendered as
    <A HREF = link>FOO</A>
    and
    <!-- -->FOO</A>
    respectively (assuming Cody Pendant swaps characters for entities).
    LAI
    :eof
      The point is the detect wrong or illegal HTML, so assuming the given text validates is silly. If it would validate, the whole excercise would be futile. Also, the first example is valid HTML, and has always been valid HTML. In the second example, no nesting is going on. There's just one A element.

      Abigail

        As I understood the problem, the goal was not necessarily to detect wrong or illegal HTML, but to make sure the output was valid so that posts further down the page are not screwed up. I never suggested that the input be assumed to be valid; in fact the way I built my suggested solution was to detect valid anchors and to render everything else as text (with entities). I feel that my suggestion, while not complete, at least lends itself to being able to prevent user mistakes or ignorance from affecting other posts.

        Oh, and when I mentioned nesting, I meant that the comment inside the anchor element would be treated by my regex like nesting. I know that what you wrote was in fact an example of a legal comment inside a single A element, but since there is no reason for a user to comment the code in a BBS post I felt the mangling of that was an acceptable loss.


        LAI
        :eof

        The first is definitely not:
        HTML 4.01
        XHTML 1.0

        However, it will still display correctly in browsers. A better breaking example might be: <a href = li'nk>FOO</a>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://216575]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2025-07-08 12:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.