Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

let's have valid html

by particle (Vicar)
on Nov 16, 2004 at 19:17 UTC ( #408213=pmdevtopic: print w/ replies, xml ) Need Help??
particle has raised the following topic:

i haven't had much time lately to devote to perlmonks, and i've been not only impressed with the recent pace and scope of changes, but inspired by it. i'm happy to see the many new css classes, and happy to see that the css is valid as per the w3c css validator.

i'd like to see the same for the html markup, which doesn't validate to any x?html variant. the markup is defined as html 4.0 transitional, but there are some xhtml-style tags used throughout. i'd like to apply patches to help make this site validate, but before i start making changes, i'd like to confirm the target doctype. personally, i'd prefer aiming for xhtml, but if the decision is to stay with html 4.0 transitional, i'll aim for that.

anyone?

~Particle *accelerates*

Comment on let's have valid html
Re: let's have valid html
by hossman (Prior) on Nov 16, 2004 at 22:26 UTC

    I would concur with the goal of xhtml ... it's time to step into the 21st century. The question i would ask is: xhtml1 (which has been stable, but is allready 2 years old) or xhtml2 (which is a lot more current, but not officially finalized) ?

    ?

      What are the major differences?

      - apotheon
      CopyWrite Chad Perrin

        IIRC, xhtml 2 removes everything that was marked depreciated in HTML 4.01, which is a major huge change, both in as much as it changes a lot, and in as much as it seemed, prior to that, to support everything that had ever been in common use.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

        I'm no expert, but theorbittwo's assessment sounds about right to me. All in all there are some pretty weighty changes.

        For example, check out this little gem from the xhtml FAQ...

        Is <img> being replaced by <object> in XHTML2?

        No. <img> is being replaced in XHTML2, but by something else ...

        What XHTML2 does is say that all images are equivalent to some piece of content; it does this by allowing you to put a src attribute on any element at all. What this says is: if the image is available, and the browser can process it, use it, otherwise use the content of the element. For instance:...

        Take a look at the elements in xhtml2 to get a sense of how much has changed.

      There's a long road ahead till browsers actually support XHTML 2.0 even decently compliant engines such as Gecko aren't nearly there yet.

      XHTML 1.1 Strict is what you should aim for at the moment, if you are serious about compliance. It's problematic for us though, for reasons I state elsewhere in the thread.

      Makeshifts last the longest.

Re: let's have valid html
by TedPride (Priest) on Nov 17, 2004 at 10:56 UTC
    stability > newness
Re: let's have valid html
by theorbtwo (Prior) on Nov 17, 2004 at 11:03 UTC

    I've been slowly putting in changes that make us more xhtml 1.1 transitional valid -- even though the doctype is stated as html 4. Note that the HTML gets much more compliant if you add ;htmlnest=1. The major places we're non-compliant are:

    • In shownote, indentation is done with nested <ul>s. There is no <li> for the content, though, which is illegal. I'm not sure about the compatability of any of the easy workarounds -- could use a li type=none, haven't done any research into how compatable that is.
    • HREF attributes are sometimes capitalized by code in Everything/HTML.pm. Last I heard, tye was working on this.
    • I don't remember what the third one was.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      thanks for the response.

      In shownote, indentation is done with nested <ul>s. There is no <li> for the content, though, which is illegal. I'm not sure about the compatability of any of the easy workarounds -- could use a li type=none, haven't done any research into how compatable that is.

      or you could use style sheets instead of nested uls.

      wasn't aware of the htmlnest option, i'll check it out. also wasn't aware of the reliance of everything being patched. as diotalevi pointed out below, we probably won't get posts in xhtml format--that's a fair point. but i think xhmtl 1.0 transitional is still pretty forgiving--note the transitional in the moniker. i don't expect the site will validate on every page, but newest nodes (and other pages where it's possible) should be valid something, in my opinion.

      ~Particle *accelerates*

        The compatibilty of the site is of the utmost importance to me. Because of this, using style sheets instead of nested ULs isn't going to cut it, because there are still a decent number of browsers that won't render it according to our intent. It's even worse then putting in the li tags -- putting in the li tags will make an extra dot appear on browsers that don't allow us to hide it, but using spans/divs with custom style will loose the indent totally.

        ...and Newest Nodes very nearly validates if you give the right options. The problems are:

        • Everything/HTML.pm using HREF instead of href (one occourance).
        • Problems caused by an out-of-date CGI.pm (13 occourances). (Not using XML empty-element syntax where approps, not quoting attributes, and writing POST instead of post.
        • merlyn putting in literal character 149s into node titles, which are not escaped, and aren't valid characters in, well, anything. (Three occourances as I write this.)

        Patches to solve these would be nice, but in these purticular cases, they'd require godly intervention (with the possible exception of the third one, which possibly can be solved purely though a patch of handlelinks settings.


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: let's have valid html
by diotalevi (Canon) on Nov 17, 2004 at 19:29 UTC
    PM will never be xhtml compliant because its users don't write their nodes in xhtml. If you declare that a page is xhtml then browsers tend to hold you to it and throw will actual parsing errors. I'm just noting this so that when we finish getting PM's code into W3C compliance we don't also make the mistake of telling the browser to expect anything but highly suspect markup.

      No, they don't. It depends on the MIME type. The spec says application/xhtml+xml pages MUST be parsed with draconian error handling, but pages with text/html need not. Note that text/html is deprecated for XHTML 1.0 but invalid for XHTML 1.1.

      In other words, if we use XHTML 1.0 Transitional served with MIME type text/html we shouldn't run into problems.

      Of course, we're already scrubbing users' HTML, so I don't see why it would be too difficult to clean it using f.ex the tagsoup algorithm.

      Makeshifts last the longest.

        I prefer the current PM algorithm for normalizing user HTML over tagsoup's. We could add more knowledge about allowed parent tags but such would be used to escape tags that aren't in the proper parent rather than to close and reopen tags.

        And I feel quite strongly that the priority of goals should weigh practical matters much, much higher than technical milestones like strict compliance with a standard.

        For example, <p> tags will probably never be forced to be strictly nested because there is no practical way to accomplish this given the current state of users and browsers.

        The disadvantage that this prevents using a (compliant) XML parser on some filtered user HTML w/o first filtering <p> tags is a practical disadvantage but of less importance than the practical advantage of allowing people to easily enter their own HTML mark-up that displays well for most of our audience.

        While the disadvantage of <p> tags not strictly complying with any particular standard is not a practical matter. Strict compliance can lead to practical benefits but complance itself is not a practical benefit. So strict compliance can be desirable for many reasons such as setting a good example, geeky pride, pedantic intollerance, etc., but such concerns don't even cast a shadow IMO compared to even relatively minor practical advantages if such conflict with each other.

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2014-10-01 10:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (6 votes), past polls