Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Understanding this particular Regex.

by ww (Bishop)
on May 05, 2013 at 10:47 UTC ( #1032119=note: print w/ replies, xml ) Need Help??


in reply to Understanding this particular Regex.

"<HR +SIZE *=  *[0-9]+ *> " is readily understood as a regex only if one also assumes the use of alternate delimiters; for example

if ( $somevarv =~ m<HR +SIZE *= *[0-9]+ *> ) { do something.... }

One can argue that that's implicit in your post. I suspect NetWallah would so argue and certainly offered an accurate response based on that interpretation.

OTOH, one can certainly also argue that the OP almost mischievously ambiguous (or simply wrong). For example the spaces allocated -- if they exist -- and the lack of quotes around the size value -- would make the target-tag for your so-called regex non-conformant with html 4.01 or any more recent spec... a fact that's not conclusive but which might lead those familiar with the w3c specs to scratch their heads and wonder what you're talking about, before moving on to other questions.

My point? Careful attention to precision and specificity can (and usually is) very helpful to those who try to help those who are SOPW.

If you didn't program your executable by toggling in binary, it wasn't really programming!


Comment on Re: Understanding this particular Regex.
Select or Download Code
Re^2: Understanding this particular Regex.
by tobyink (Abbot) on May 05, 2013 at 13:26 UTC

    Nonsense. Go along to the W3C HTML validator, select "Direct Input" and paste in the following markup:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <title>Foo</title> <hr size = 1 >

    ... and you'll find it's fully conformant HTML 4.01. Spaces around the equals sign are rarely used, but valid. And attribute values conforming to the following regexp do not need to be quoted:

    /^[A-Za-z0-9_:-]+$/

    And HTML5 is even more permissive.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
      Not "nonsense."

      Last time I chased this kinda' question thru the specs themselves, the validator came up short of fully satisfying the w3c 4.01 transitional spec and even farther short of the strict spec.

      The validator, for example, blesses your code ("validates") without error (albeit, with warnings) despite the lack of <head>...</head>, <body>...</body> and <</html> tags... and that's using the transitional spec which allows no such things.

      If you try it with strict, upload mode, and add:

      <table width = 17%>

      you'll see even the validator lets fly:

      If this error occurred in a script section of your document, you should probably read this FAQ entry. Error Line 9, Column 18: an attribute value must be a literal unless it contains only name characters <table width = 17%> You have used a character that is not considered a "name character" in an attribute value. Which characters are considered "name character +s" varies between the different document types, but a good rule of thumb +is that unless the value contains only lower or upper case letters in the + range a-z you must put quotation marks around the value. In fact, unless you have extreme file size requirements it is a very very good +idea to always put quote marks around your attribute values. It is never wr +ong to do so, and very often it is absolutely necessary."

      Your regex and the accompanying statement are correct, as far as they go, but are most closely applicable to webmonkeys (yeah, been there; done that.) writing for NS or IE4 style browsers. Today, however, you'll find widths (for example and where used) expressed as ems, ens (no problem as long as you don't introduce spaces) or as percentages... as in the example above. The "%" sign is an example of a warstopper.


      A little knowledge is a dangerous thing; categorical statements based on incomplete knowledge are apt to be even more so.

        "the validator came up short of fully satisfying the w3c 4.01 transitional spec and even farther short of the strict spec"

        It is true that there are conformance requirements which the validator is unable to check. However, my example exploits none of these. I haven't tricked the validator; it's simply a valid HTML 4.01 Transitional document.

        It would be valid HTML 4.01 Strict, except that the <hr size> attribute is presentational and Strict doesn't contain most of the presentational attributes.

        If you prefer an example that passes Strict:

        <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <title>Foo</title> <hr class = size-1 >

        "The validator, for example, blesses your code ("validates") without error (albeit, with warnings) despite the lack of <head>...</head>, <body>...</body> and <html>...</html> tags... and that's using the transitional spec which allows no such things.

        The <html>, <head> and <body> start and end tags are all optional in every version of HTML that has ever been published by the W3C. (They are of course required in XHTML, but that's not what we're talking about.)

        For example, see The HTML element, which says, "Start tag: optional, End tag: optional". You'll find the same under the definitions for HEAD, BODY and also TBODY. Many elements have optional end tags, but IIRC those are the only four with optional start tags.

        "If you try it with strict, upload mode, and add: <table width = 17%> you'll see even the validator lets fly"

        Indeed. As I said, attribute values do not need to be quoted if they conform to the regexp /^[A-Za-z0-9_:-]+$/. The percent sign character is disallowed by that regexp, so that attribute value needs quoting.

        "Your regex and the accompanying statement are correct, as far as they go, but are most closely applicable to webmonkeys (yeah, been there; done that.) writing for NS or IE4 style browsers."

        You think modern browsers don't support HTML 4.01? In most cases they support it better than those early browsers you mention did; and in most cases they support HTML better than they support full-blown XHTML.

        package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1032119]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2014-12-29 06:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (184 votes), past polls