Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Your challenge is to 'golf' some Perl code (produce code that requires the fewest [key] strokes -- fewest characters) that mostly just does s/--/-/g, but with some simple restrictions. I was surprised that I implemented this simple task over a dozen times before I finally got it right. I golfed mine down to 80 characters, so I wanted to see what y'all can come up with. Getting a correct solution may be a bigger challenge than golfing the solution.


A 'de facto HTML comment' is started by "<!--" and ended by "-->" and can contain anything between those two delimiters except, of course, "-->". This is such a nice, simple, easy-to-parse definition that it has advantages over a standard HTML comment.

Some (notorious but still very popular) browsers only handle de facto HTML comments. Many browsers only handle standard HTML comments.1

Your task is to golf some code that will adjust de facto HTML comments so that they are also standard HTML comments. I'll let those who are curious about the details of standard HTML comments visit Google. The only detail we need to worry about for the golf is that "--" inside of a de facto HTML comment is the problem.

Although "<!-- foo -- -- bar -->" is a valid HTML comment according to both the standard and de facto definitions, I'll make the task much easier by just requiring that all occurrences of "--" be replaced inside of the de facto comments. But we want to change as few pixels as possible so we'll transform the above comment to something like "<!-- foo - - bar -->".

If you can code a solution that changes even fewer characters but still makes sure each de facto comment ends up also being a standard comment, then you'll get bonus points (in the tradition of Whose Line Is It Anyway).

I chose "" (the "not" symbol, "\xAC", &#xAC;=¬) because it looks a lot like "-" in most fonts and is still in Latin-1. The soft hyphen (&#xAD=&shy;) looks even closer to "-" but shouldn't be displayed at all in most cases, so I rejected it. The en dash is "–", &#x2013;, &ndash;, and is "\x96" in Windows-1252 (Microsoft's extension to Latin-1 which is nearly the de facto interpretation of "Latin-1") and it also looks even more like "-". But some browsers are still standards-compliant enough that they won't display that. How does your browser display it ()?

The rules

  1. Insert as few characters as possible into the following code:
    #!/usr/bin/perl -w use strict; $| = 1; $/ = ''; for( <DATA> ) { #2345678 1 2345678 2 2345678 3 2345678... # Replace this line with your code ; print; }
    Some sample input is shown later.
  2. Your code must make it so that, for each "<!--" that starts a de facto HTML comment, the next occurrence of "--"s after it is the first two characters of "-->" (which ends the comment). Bonus points for instead making each comment valid according to the HTML standards.
  3. Your code should change as few characters as possible.
    • So it should not change any characters outside of de facto HTML comments. (If there is a "<!--" that is never followed by a "-->" then your code can either treat the rest of the string as being inside a comment or outside, whatever makes your code shorter.)
    • Rerunning your code on output from your code should make no changes.
    • Your code must only change "-" to "". So running tr/\x95/-/ on the input and output should give the same results.
    Points deducted for changing too many characters but even more points deducted for not producing comments that fit both definitions.
  4. You can assume the input and output are 8-bit Latin-1. Or you can assume utf-8 strings if you prefer. Other encodings might be legal though I can't think of any advantage.
  5. You get penalized for causing global side effects. This means that using "$a" instead of "my $x" isn't going to be a net win here. You can use global variables for their intended purposes but you'll get a small penalty if you change them and don't change them back (either to their previous value or to their standard default value).
  6. You get penalized for causing warnings.
  7. Please hide your solutions like spoilers (such as using a table or similar to set identical foreground and background colors and/or using READMORE tags and putting "spoilers" in your node title).

Later I'll post my solution and some test code that covers some of the rules. For now, I don't want to hint at techniques to try.

Here is some test data (but don't assume this is the only data you need to handle):

__END__ ---<!-- -->---> <--!-- <!-- -- --> --> <!---->--<!----->-<!------>---<!-------> <!---><!----> <!--->--<!----> <!--->---<!----> <!--->----<!----> -<!-->--<!-->--<!-->---<!--> <!--><!-->-<!-->--<!-->--<!-->---<!-->-- <!-- - - --> <!--- ---> <!---- ---->

1 Some browsers don't manage to get either definiton right. I have a copy of Opera that appears to require < and > to be balanced inside of HTML comments. Opera impresses me both with its nice features and how it manages to have bugs that are just so, well, stupid. (:

- tye        

In reply to Golf: Fix de facto HTML comments by tye

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and !@monks...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (5)
    As of 2018-06-22 04:32 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (121 votes). Check out past polls.