Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Perl Monks hypocrisy

by Wassercrats
on Jul 16, 2003 at 07:16 UTC ( #274711=monkdiscuss: print w/ replies, xml ) Need Help??

Maybe hypocisy is too strong a word, but I had to think of something these things have in common. I didn't feel each was worthy of its own post.

One big reason for people liking Perl is that it's a quick, compact language. Why then is this the only of a gazillion message boards (that I know of) that requires the use of tags for something as simple as a line break? Can't anyone who maintains this board figure out how to add an auto-line-break feature?

Two other things I hear from alot of Perl programmers is how important it is to use strict, and how you shouldn't parse HTML with a regular expressions. Well, thanks to my regex HTML parser, I discovered that someone wasn't strict enough in debugging their HTML. In every section, the source code of the "Offer your reply" links is [Offer your reply&#093 (note the missing semicolon). Though the w3 validator didn't catch that one, it did find 283 other errors in http://perlmonks.com/index.pl.

I guess you could also infer from this post that I pay no mind to my reputation here.

Not that I don't like perlmonks...

Comment on Perl Monks hypocrisy
Select or Download Code
Re: Perl Monks hypocrisy
by Juerd (Abbot) on Jul 16, 2003 at 07:35 UTC

    Can't anyone who maintains this board figure out how to add an auto-line-break feature?

    Just like most people, you use <p> tags. So you know why line-breaks are bad. Textareas may or may not wrap text. When a textarea does not wrap text, the user is likely to hit the return key in places where a <br> is not wanted.

    (note the missing semicolon)

    Quoting the HTML 4.0 specification:

    Note. In SGML, it is possible to eliminate the final ";" after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ";" in all cases to avoid problems with user agents that require this character to be present.
    As you can see, the semicolon is recommended, not required.

    Well, thanks to my regex HTML parser, I discovered ...

    Your parser being regex based is not relevant here. Regular expressions CAN be used to parse HTML. But a set of regexes that parses any HTML document correctly is much less efficient than something based on HTML::Parser. But it is very unlikely that your parser handles every feature that HTML offers.

    I guess you could also infer from this post that I pay no mind to my reputation here.

    In other words: you're a troll. Please troll elsewhere lest more people feed you.

    it did find 283 other errors in http://perlmonks.com/index.pl.

    I'm sure your patches are more than welcome. But for now: it works, so let's not break it while trying to fix a problem that isn't there in the first place.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Textareas may or may not wrap text. When a textarea does not wrap text, the user is likely to hit the return key in places where a break is not wanted.

      That was really my main beef that I had for a long time. Now that I think of it, this board has alot more posts of code than others, and code on other boards has wrapping problems, but requiring everything else to be tagged still seems unnecessary.

      That specification you quoted is interesting. I've suggested things elsewhere that would add less structure to HTML. In this case, I'd go for consistancy. I think I'll change my script to support the absence of those semicolons.

      I don't intend to be a full time troll, and I don't take food from strangers.

      But a set of regexes that parses any HTML document correctly is much less efficient than something based on HTML::Parser.

      As I recall, we tried a module based on HTML::Parser but had to drop it because it was way too slow (10-times slower, IIRC). PM uses a single regex to split the HTML into tokens and another regex to deal with filtering attributes in those tokens.

      There are two main reasons that I'd advise someone to not "parse HTML with (a) regex(es)". Performance is not one of them.

      The main point is that you probably shouldn't use something like /<td>(.*?)</td>/ because there is no way to make that ignore HTML comments that contain similar HTML. The other is that doing such can look easy but end up being very hard so it is often less work in the long-run to use a decent module from the start, even though that often looks like a more difficult approach.

      Update: The "HTML" that we parse is stuff typed in by our users "by hand". So our HTML parser (the regex) intentionally deals with certain border cases in specific ways. No, it does not strictly follow any one of the many HTML standards we have to choose from.

                      - tye

        As I recall, we tried a module based on HTML::Parser but had to drop it because it was way too slow (10-times slower, IIRC).

        The speed has everything to do with the complexity of your parser. If you don't need to follow specifics, and don't need to implement the usual browser quirks, a single regex is often a lot more efficient. It's up to the end user to benchmark it. Unfortunately, most novices don't know how to write the regex, don't know how to write an HTML::Parser based scripts and don't know how to benchmark.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Note. In SGML, it is possible to eliminate the final ";" after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ";" in all cases to avoid problems with user agents that require this character to be present.
      As you can see, the semicolon is recommended, not required.

      Sorry, I have to side with Wassercrats on this one. Just because you can, sometimes, doesn't mean you should. It is easy to get that semi-colon in there. I've known a number of browsers over the years that never rendered correctly an entity lacking a semi-colon. Either they let it go through textually, or ate the remaining characters up to the end of the line.

      Even Mozilla had this problem up until a year or so ago. If you can count on a semi-colon being required you simplify the parsing greatly. Just because SGML says it's recommended that does not make a good basis for choosing to do so. SGML has all sorts of markup minimisation short cuts available, because at the time people were paid to key stuff in, paid by the keystroke and there were no fancy GUI editors around. And plus it's just more comfortable to be able to omit needless stuff.

      This made the job of writing an SGML parser a Herculanean undertaking. James Clark is about the only person who really pulled it off.

      A much more reasonable comparison would be to consider XML. There, the trailing semi-colon is mandatory. This is because Tim Bray and the team that created XML wanted something that was easy to parse. Easier than full SGML in any case, and in comparison to that they succeded admirably.

      I realise that the problem is difficult for Perlmonks. It would be feasible to make sure that any HTML generated directly by Everything is well-formed, but this does not take into account what passes for HTML typed in by the site's population.

      Argh, just thinking about &, &amp, &amp; and R&D and what Everything makes of them makes my brain hurt :)

      _____________________________________________
      Come to YAPC::Europe 2003 in Paris, 23-25 July 2003.

Re: Perl Monks hypocrisy
by PodMaster (Abbot) on Jul 16, 2003 at 07:39 UTC
    One big reason for people liking Perl is that it's a quick, compact language. Why then is this the only of a gazillion message boards (that I know of) that requires the use of tags for something as simple as a line break? Can't anyone who maintains this board figure out how to add an auto-line-break feature?
    Same reason vrooms XP is whatever it is. Perlmonks formatting works the way it does, and is in no way a reflection on perl.
    ... Well, thanks to my regex HTML parser ...
    1. So what?
    JavaJunkies runs on perl, and some people were baffled that a site about java wasn't done in java. One has nothing to do with the other.

    2. What couldn't you do that with an existing html parser that you had to roll your own?
    Reusing proven tools improves productivity. I fail to see why the perlmonks should help people debug regexes for html parsing any more than they should help someone roll their own CGI.pm. It's just a waste of time.

    update: You want autobreaking, make like a good perlmonk and suggest the feature effectively. I for one would not like it one bit, cause I've been formatting my posts by hand for 2-3 years now, and I ain't gonna change any time soon.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Perl Monks hypocrisy
by crenz (Priest) on Jul 16, 2003 at 10:36 UTC

    On a factual level: Yes, the people who maintain this board have enough brains to figure out an auto-line-break (or "insert <p> tags" feature). That's what I infer from my dealings with them. So my guess is that not having this feature is a deliberate choice, rather than dumbness. A lot of message boards resort to inventing their own tags to allow for formatting. For a site like this one that caters to programmers, I find it an excellent choice to just give them all the expressiveness they want by letting them use what they already know: HTML.

    Regarding the validity of the HTML generated -- PM has grown over the years, and little bugs can be found here and there. You can help the site maintainers by giving appropriate feedback.

     

    On a personal level: Looking at your post, I get the impression that you are trying to pretend that you want PM to improve, but actually you want to show that you know things better than those who are too dumb to figure out auto-line-break features. Or maybe you are retaliating against someone who told you not to parse HTML using regexes. Well, hopefully I'm wrong.

    At PM, we tend to not take ourselves or Perl too serious. That helps in maintaining good relationships and actually getting work done.

      Just a side note

      For a site like this one that caters to programmers, I find it an excellent choice to just give them all the expressiveness they want by letting them use what they already know: HTML.

      Sometimes I feel a little out of place. Many monks are doing web-based stuff, but little-ole me has never (i repeat never) did anything with a web except browse it, or sweep it away.

      However, I did go and learn some rudimentary HTML, and javascript, just so I could be like the cool kids.

      Once I got used to the <p> tags, I was ok with it. Don't know how to indent quotes from other people though (like the quote above). Help anyone?

      Anyways, back to the main topic (??) Problem with wrap-around text is sometimes I insert a <CR> and sometimes not (my mail messages usually look stupid). At least this way I have to think about what I am doing.

      UPDATE: Added the <blockquote>. Thanks diotalevi.

        <blockquote>Don't know how to indent quotes from other people though (like the quote above). Help anyone?</blockquote>

        Like that.

      For a site like this one that caters to programmers, I find it an excellent choice to just give them all the expressiveness they want by letting them use what they already know: HTML.
      What's HTML got to do with programmers? If I expect one markup language Perl programmers would know, it would be POD, not HTML. Besides, the language accepted on perlmonks isn't HTML. It has markup that HTML doesn't have (for links for instance), lots of HTML markup isn't allowed (a span element is allowed, but not a style attribute, which makes span not very useful, and there are other elements that aren't allowed), and some elements get a totally different meaning (code).

      I don't think anyone knows the language being used on Perlmonks prior to arriving here.

      Abigail

Re: Perl Monks hypocrisy
by katgirl (Hermit) on Jul 16, 2003 at 13:02 UTC
    I was always told to humor idiots, but in this case I'll make an exception.

    Though the w3 validator didn't catch that one, it did find 283 other errors in http://perlmonks.com/index.pl.
    In my copy of the bible, and presumably in yours too, there's a bit about taking planks out of your own eye before looking for specks in other people's. So if you're going to get picky about PM's HTML...

    Plank!

    :)

      HTML is still better than plain text
Re: Perl Monks hypocrisy
by cfreak (Chaplain) on Jul 16, 2003 at 13:41 UTC

    I would like an auto-break feature as well. Maybe something like on /. where you can choose HTML formatted or plain text (the plain text autobreaks but still allows links and things like bold and italic)

    However I'm not sure ranting about it is the right way to get such a feature implemented. I mean if you want to complain about hypocrisy, complain about people who flame the newbies rather than trying to point them in the right direction, that's hypocrisy since someone at some point probably helped them. A technical issue is not hypocrisy, in fact I'm willing to bet that it most people find it to be a feature, and it probably also saves the server on processing power.

    As for the HTML parsing with regexes: It is a fact that a regex cannot catch all possible valid HTML, however a true parser can and there are parsers availiable. Just like every other suggestion for a module on this site suggesting an HTML parser is done to save the user time from re-inventing a wheel. This is consistant, use the right tool for the job.

    Lobster Aliens Are attacking the world!

      It was good to see that "Re: Convert Text to HTML Checkbox (POD)" link. I'd been wondering if there was a "if you don't know how to format your messages, you don't belong here" attitude.

      As for flaming at newbies, I just gave someone a -- yesterday (yes, even I vote occationally) for being too hard on someone for asking a question without doing more research, but I don't know if the questioner was a newbie.

      I agree about not reinventing the wheel in many situations, and in general, I don't really have a complaint about people reinforcing that with regard to modules. I pointed out that I used a regex HTML parser because it was kind of consistant with the tone of my post and it was a bit ironic. But I don't think it's a good idea to avoid parsing HTML with regexes at all costs. In some cases, using a module is overkill, and a regex HTML parser isn't always difficult to write (I'm not talking about anything close to a browser though). Also, I'd fear distribution of some less popular modules being discontinued, and me having to figure out how they work so I could maintain them, if I relyed on them.

      I wish I knew how to look up the record for the lowest and highest reputation.

        I wish I knew how to look up the record for the lowest and highest reputation.

        That would be Best Nodes and Worst Nodes, but as you will see they aren't necessarily the best, nor the worst. They just happen to be the nodes that got the most ++ or -- votes. While I think Camel Code is a damn fine node, I hardly think its the best the site has to offer, and Professional Employees and Works for Hire while an important node in the context of community here is IMO nothing close to tillys best node. (I'm pretty sure he would agree too.) As some of the others said you should take things a little less seriously, and you should certainly not take XP or reputation all that seriously.


        ---
        demerphq

        <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
        how do you vote with a

        negative experience?

Re: Perl Monks hypocrisy
by talexb (Canon) on Jul 16, 2003 at 15:21 UTC
      I guess you could also infer from this post that I pay no mind to my reputation here.

    Again with the XP bit. This is sooo tiring. If the 283 errors on the front page are bugging you, sign up to become a PM developer and spend your time improving the universe, rather than whining and complaining. And your argument would have carried more water if your own site passed validation (as previously noted).

    This site works very well. I find it's a fantastic resource. If you think it's stupid, badly formatted and poorly programmed, you are, of course, entitled to your own opinion, no matter how stupid. Does this mean you're on your way then?

    Have a cheery day.

    --t. alex
    Life is short: get busy!

      I don't know about "very well," but I'd give this site a C or D for functionality and an A or B overall. I agree that it's a fantastic resource, and even if I were to get booted, I'd probably come back in the future under a different name just to repay the Perl community for all the help they have given me. (I wonder how many monks will take that the wrong way).

      The W3 validator is often wrong and not always helpful in finding problems, but some people like to create completely valid HTML anyway. I'm not one of those people, but Perl programmers who promote use strict seem more likely to be.

          I don't know about "very well," but I'd give this site a C or D for functionality and an A or B overall.

        Let me put it another way -- this site works well enough for me to visit it just about every working day. I'm happy to do my part, occasionally replying when I have an answer, doing a little moderating and so forth. Sure, there are some quirks, but that's OK -- I'm quirky myself, so that fits.

        The validator is useful when it points out major errors, and I try to use border="1" instead of border=1, for example, but if a page isn't perfect, that's not the end of the world.

        --t. alex
        Life is short: get busy!

        even if I were to get booted,

        You wont get booted. As far as a I know nobody has been "booted" from the site. Its not that kind of place. We'll reap your nodes, silence you occasionally, -- you, criticise you, bitch, moan and wheedle. But boot you out. No I dont think so. See, the whole point is to learn, and people who get expelled from school don't learn much that is useful to them or to anybody else.

        The W3 validator is often wrong

        I hope you realize how strong a statement this is. Without backing it up with some solid evidence, I would have thought that whatever credibility you have is blown.

        but Perl programmers who promote use strict seem more likely to be.

        You seem to think that using strict is just some form of pedantic behaviour. That its only use is to satisfy some artificial concept of correctness. It's not. Strict is there to keep newbies from asking the same set of stupid questions over and over. Its to protect you from yourself when you are tired, stressed, overworked, distracted, forgetfull, and dyslexic, jittery from too much coffee and the like. Its to catch your typos and your silly oversights. It's friend use warnings is there to catch other less critical matters but for the same set of reasons. We don't advise people to use strict because we think they should, we advise them to do so because we know that no matter how good you are that you can make mistakes that the simple mechanisms of strictures and warnings will catch.

        Comparing writing super-correct HTML (a language designed to be extremely fault tolerant) to using strictures, is like comparing good home decoration to not smoking in bed. The opposite of the first means a messy apartment, the opposite of the second means that eventually you will end up dead.


        ---
        demerphq

        <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
Re: Perl Monks hypocrisy
by Aristotle (Chancellor) on Jul 16, 2003 at 15:29 UTC
    Even if PM's HTML was flawless, not every poster's would be.

    Makeshifts last the longest.

Re: Perl Monks hypocrisy
by chromatic (Archbishop) on Jul 16, 2003 at 16:26 UTC
    Can't anyone who maintains this board figure out how to add an auto-line-break feature?

    No, we're all too stupid.

    Seriously. Round-tripping is hard. Do you convert everything into a canonical form in the database and render it into HTML for display and then back to the poster's preferred form for editing? How do you deal with quirks and mistakes? Is it important to guarantee that the post is preserved as originally typed? There's also backwards compatibility to deal with, some 270,000 nodes that are stored as HTML fragments already.

    If you have an easy solution, I'm all ears. I've been doing this long enough that I don't believe in many easy solutions, though.

      For what it's worth, I think tye's post sets out a plausible road map for dealing with this issue:

      1. The new functionality remains inactive if the text of a post contains anything that looks like tagged markup. (This could be a conservative test, such as /\<\w+/.)
      2. The auto-markup process should be paragraph oriented rather than line oriented, except that indented lines should get code treatment. (Similar to POD.)
      3. Rather than round-trip conversions, the function be applied in one direction, at the time the message is being edited. On the preview page, if the post doesn't seem to contain any tags, show an auto-markup version on the bottom of the page, and give the user an "Auto-Markup" button that applies the markup and shows the preview again.

      Am I sweeping too many details under the carpet, or might this only require a handful of lines of new code?

      # In site init code somewhere use HTML::FromText; my %text2html_options = map ($_=>1) qw(paras blockcode urls email); ... # In preview form command handler if ( $op = "auto-format" ) { $doctext = text2html( $doctext, %text2html_options ); } ... # At bottom of preview page if ( $doctext !~ /\<\w+/ ) { my $markup = text2html( $doctext, %text2html_options ); if ( $doctext ne $markup ) { print "<hr>If the automatic formatting below looks correct, you ca +n apply it with <input type="submit" name="op" value="auto-format" /> +. <p>$markup"; } } ...
Re: Perl Monks hypocrisy
by dws (Chancellor) on Jul 16, 2003 at 16:58 UTC
    Maybe hypocrisy is too strong a word, ...

    When you find yourself questioning your choice of title, especially in your opening sentence, change the title. Otherwise, it looks like you're asking "did I really mean to do that?" after throwing a turd. Either use greater care when choosing your words, or mean what you say.

      Well, his actual words were:

      Maybe hypocisy is too strong a word

      So maybe Wassercrats should write himself a spellcheck to go along with the "regex HTML parser" :)

        So maybe Wassercrats should write himself a spellcheck to go along with the "regex HTML parser" :)

        And write the spellchecker as a (very long) regular expression :-)

         

        perl -le 'print+unpack"N",pack"B32","00000000000000000000001001110010"'

Re: Perl Monks hypocrisy
by chunlou (Curate) on Jul 16, 2003 at 18:47 UTC

    If auto line break were to be implemented, the board would probably have to offer the input text box in two flavors: Basic (where you have the auto line break) and Advanced (where you do your own formatting).

    It is the same reason some people use Frontpage, Dreamweaver, etc. to mark up a website; some use simply Notepad or something plain.

    In reality, auto line break could be one of those seemingly trivial problems but not necessarily trivial to a computer. If you mean "line break" as literally "\n" wherever they appear, that's easy (but it will break many other HTML code, such as a table unless someone enters a HTML table all in one line).

    If you mean "line break" as "paragraph" that actually could be very hard. It might look "obvious" to human eye what a paragraph is but it's very for a, say, HTML parser to distinguish because it can only read data, not content.

    As to the second "hypocrisy" or inconsistency, that's a good catch. But since the site has been up and running for a long while, not sure those errors or warning messages matter.

    Eventually, the content of the site that people see is more important than the code behind the site that people don't see.

      Eventually, the content of the site that people see is more important than the code behind the site that people don't see.
      What is more important? Is it the content? Is it the code? Is it the looks?
      Most ppl here say it's not the looks, but when they complain about people who care about the looks, they do too.
      'Not to care about the surface, that's superficial!' (Oscar Wilde)
Re: Perl Monks hypocrisy
by Jenda (Abbot) on Jul 16, 2003 at 21:39 UTC

    I guess the closest to your auto-line-break that would have any chance of being implemented here would be to be able to write the nodes in POD. But the preformance problem chromatic talks about is very valid. This would mean that the nodes written in POD would have to be converted to HTML each time they are to be displayed or they'd be converted to HTML when submited and be presented as HTML to the user if he ever tries to modify it.

    Of course the PerlMonks system could store both versions of such nodes, but I think that would be just a waste of space.

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

Re: Perl Monks hypocrisy
by vnpandey (Scribe) on Jul 18, 2003 at 22:51 UTC
    Maybe hypocisy is too strong a word..

    Brother :) your query,comment starts with something which you are yourself not sure of.. and that is reflected along most of the query,comment. as ex..

    ....it did find 283 other errors ...

    only you know about these 283 errors... better to remove them than to whine about them.. but if they do exist in first place at all..

    I will just like to request you to once think before posting.. hope you do heed to it... still retaining your frankness..

    pandey

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://274711]
Approved by Tanalis
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-12-28 03:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls