Your challenge is to 'golf' some Perl code (produce code that requires the fewest [key] strokes -- fewest characters) that mostly just does s/--/¬-/g, but with some simple restrictions. I was surprised that I implemented this simple task over a dozen times before I finally got it right. I golfed mine down to 80 characters, so I wanted to see what y'all can come up with. Getting a correct solution may be a bigger challenge than golfing the solution.
Background
A 'de facto HTML comment' is started by "<!--" and ended by "-->" and can contain anything between those two delimiters except, of course, "-->". This is such a nice, simple, easy-to-parse definition that it has advantages over a standard HTML comment.
Some (notorious but still very popular) browsers only handle de facto HTML comments. Many browsers only handle standard HTML comments.1
Your task is to golf some code that will adjust de facto HTML comments so that they are also standard HTML comments. I'll let those who are curious about the details of standard HTML comments visit Google. The only detail we need to worry about for the golf is that "--" inside of a de facto HTML comment is the problem.
Although "<!-- foo -- -- bar -->" is a valid HTML comment according to both the standard and de facto definitions, I'll make the task much easier by just requiring that all occurrences of "--" be replaced inside of the de facto comments. But we want to change as few pixels as possible so we'll transform the above comment to something like "<!-- foo ¬- -¬ bar -->".
If you can code a solution that changes even fewer characters but still makes sure each de facto comment ends up also being a standard comment, then you'll get bonus points (in the tradition of Whose Line Is It Anyway).
I chose "¬" (the "not" symbol, "\xAC", ¬=¬) because it looks a lot like "-" in most fonts and is still in Latin-1. The soft hyphen (­=­) looks even closer to "-" but shouldn't be displayed at all in most cases, so I rejected it. The en dash is "–", –, –, and is "\x96" in Windows-1252 (Microsoft's extension to Latin-1 which is nearly the de facto interpretation of "Latin-1") and it also looks even more like "-". But some browsers are still standards-compliant enough that they won't display that. How does your browser display it (–)?
The rules
- Insert as few characters as possible
into the following code:
Some sample input is shown later.#!/usr/bin/perl -w use strict; $| = 1; $/ = ''; for( <DATA> ) { #2345678 1 2345678 2 2345678 3 2345678... # Replace this line with your code ; print; } - Your code must make it so that, for each "<!--" that starts a de facto HTML comment, the next occurrence of "--"s after it is the first two characters of "-->" (which ends the comment). Bonus points for instead making each comment valid according to the HTML standards.
- Your code should change as
few characters as possible.
- So it should not change any characters outside of de facto HTML comments. (If there is a "<!--" that is never followed by a "-->" then your code can either treat the rest of the string as being inside a comment or outside, whatever makes your code shorter.)
- Rerunning your code on output from your code should make no changes.
- Your code must only change "-" to "¬". So running tr/\x95/-/ on the input and output should give the same results.
- You can assume the input and output are 8-bit Latin-1. Or you can assume utf-8 strings if you prefer. Other encodings might be legal though I can't think of any advantage.
- You get penalized for causing global side effects. This means that using "$a" instead of "my $x" isn't going to be a net win here. You can use global variables for their intended purposes but you'll get a small penalty if you change them and don't change them back (either to their previous value or to their standard default value).
- You get penalized for causing warnings.
- Please hide your solutions like spoilers (such as using a table or similar to set identical foreground and background colors and/or using READMORE tags and putting "spoilers" in your node title).
Later I'll post my solution and some test code that covers some of the rules. For now, I don't want to hint at techniques to try.
Here is some test data (but don't assume this is the only data you need to handle):
__END__ ---<!-- -->---> <--!-- <!-- -- --> --> <!---->--<!----->-<!------>---<!-------> <!---><!----> <!--->--<!----> <!--->---<!----> <!--->----<!----> -<!-->--<!-->--<!-->---<!--> <!--><!-->-<!-->--<!-->--<!-->---<!-->-- <!-- - - --> <!--- ---> <!---- ---->
1 Some browsers don't manage to get either definiton right. I have a copy of Opera that appears to require < and > to be balanced inside of HTML comments. Opera impresses me both with its nice features and how it manages to have bugs that are just so, well, stupid. (:
- tye
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Golf: Fix de facto HTML comments (spoiler)
by blokhead (Monsignor) on Jul 18, 2004 at 19:26 UTC | |
Re: Golf: Fix de facto HTML comments
by dws (Chancellor) on Jul 18, 2004 at 22:11 UTC | |
by Chady (Priest) on Jul 19, 2004 at 07:33 UTC | |
by eyepopslikeamosquito (Archbishop) on Jul 19, 2004 at 09:59 UTC | |
Re: Golf: Fix de facto HTML comments
by bageler (Hermit) on Jul 19, 2004 at 02:12 UTC | |
Re: Golf: Fix de facto HTML comments
by ysth (Canon) on Jul 18, 2004 at 22:00 UTC | |
Re: Golf: Fix de facto HTML comments
by BrowserUk (Patriarch) on Jul 19, 2004 at 07:14 UTC | |
Re: Golf: Fix de facto HTML comments
by Chady (Priest) on Jul 19, 2004 at 07:47 UTC | |
Re: Golf: Fix de facto HTML comments
by Wassercrats (Initiate) on Jul 18, 2004 at 23:40 UTC | |
by Solo (Deacon) on Jul 19, 2004 at 00:24 UTC |