Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^3: Understanding this particular Regex.

by ww (Bishop)
on May 05, 2013 at 19:27 UTC ( #1032146=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Understanding this particular Regex.
in thread Understanding this particular Regex.

Not "nonsense."

Last time I chased this kinda' question thru the specs themselves, the validator came up short of fully satisfying the w3c 4.01 transitional spec and even farther short of the strict spec.

The validator, for example, blesses your code ("validates") without error (albeit, with warnings) despite the lack of <head>...</head>, <body>...</body> and <</html> tags... and that's using the transitional spec which allows no such things.

If you try it with strict, upload mode, and add:

<table width = 17%>

you'll see even the validator lets fly:

If this error occurred in a script section of your document, you should probably read this FAQ entry. Error Line 9, Column 18: an attribute value must be a literal unless it contains only name characters <table width = 17%> You have used a character that is not considered a "name character" in an attribute value. Which characters are considered "name character +s" varies between the different document types, but a good rule of thumb +is that unless the value contains only lower or upper case letters in the + range a-z you must put quotation marks around the value. In fact, unless you have extreme file size requirements it is a very very good +idea to always put quote marks around your attribute values. It is never wr +ong to do so, and very often it is absolutely necessary."

Your regex and the accompanying statement are correct, as far as they go, but are most closely applicable to webmonkeys (yeah, been there; done that.) writing for NS or IE4 style browsers. Today, however, you'll find widths (for example and where used) expressed as ems, ens (no problem as long as you don't introduce spaces) or as percentages... as in the example above. The "%" sign is an example of a warstopper.


A little knowledge is a dangerous thing; categorical statements based on incomplete knowledge are apt to be even more so.


Comment on Re^3: Understanding this particular Regex.
Select or Download Code
Re^4: Understanding this particular Regex.
by tobyink (Abbot) on May 05, 2013 at 20:54 UTC

    "the validator came up short of fully satisfying the w3c 4.01 transitional spec and even farther short of the strict spec"

    It is true that there are conformance requirements which the validator is unable to check. However, my example exploits none of these. I haven't tricked the validator; it's simply a valid HTML 4.01 Transitional document.

    It would be valid HTML 4.01 Strict, except that the <hr size> attribute is presentational and Strict doesn't contain most of the presentational attributes.

    If you prefer an example that passes Strict:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <title>Foo</title> <hr class = size-1 >

    "The validator, for example, blesses your code ("validates") without error (albeit, with warnings) despite the lack of <head>...</head>, <body>...</body> and <html>...</html> tags... and that's using the transitional spec which allows no such things.

    The <html>, <head> and <body> start and end tags are all optional in every version of HTML that has ever been published by the W3C. (They are of course required in XHTML, but that's not what we're talking about.)

    For example, see The HTML element, which says, "Start tag: optional, End tag: optional". You'll find the same under the definitions for HEAD, BODY and also TBODY. Many elements have optional end tags, but IIRC those are the only four with optional start tags.

    "If you try it with strict, upload mode, and add: <table width = 17%> you'll see even the validator lets fly"

    Indeed. As I said, attribute values do not need to be quoted if they conform to the regexp /^[A-Za-z0-9_:-]+$/. The percent sign character is disallowed by that regexp, so that attribute value needs quoting.

    "Your regex and the accompanying statement are correct, as far as they go, but are most closely applicable to webmonkeys (yeah, been there; done that.) writing for NS or IE4 style browsers."

    You think modern browsers don't support HTML 4.01? In most cases they support it better than those early browsers you mention did; and in most cases they support HTML better than they support full-blown XHTML.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
      Generally, ++ tobyink but we still disagree on more points than the limited number to which I'm inclined to create well-documented counters.

      But your question/rhetorical question, "You think modern browsers don't support HTML 4.01?" is just the opposite of my intent. Of course they do... but when the cited browsers were "the latest and greatest" we saw an awful lot of utterly non-compliant markup because devs were pushing out code that satisfied a particular browser (only). Think, also, of how commonly we used to see "<table width = 347...>", with only an implicit "px" -- i.e. code relying, mistakenly, on sometimes inconsistent calculations by various browsers).

        <table width = 347> is valid HTML. <table width="347px"> is invalid.

        The px unit is part of CSS; not HTML. In HTML, all sizes are expressed as either percentages, or a number which is implicitly in pixels. (Except <font size> where the number has its own special brand of craziness.)

        I agree that there's a lot of invalid HTML out there, and certain older browsers encouraged it, but the OP's example is valid (albeit unidiomatic) HTML.

        package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1032146]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2014-12-28 20:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls