in reply to Re: Non-English posts on Perlmonks
in thread Non-English posts on Perlmonks

so I wouldn't worry about it too much.

Neither would I. In fact, since I don't run this site, I wouldn't worry about it at all :)

That said, it's all a matter of cost vs. benefit. The probability of any damage actually coming from copyright infringement is next to none, worst case you get a cease and desist letter (and even that has a very low probability).

As for offensive content and falling trees, you never know who might walk along after the tree has fallen. It would reflect poorly on Perlmonks, but the odds are slim and definately wouldn't outweigh the benefits.

it's far from trivial to generate random strings that look like a language but isn't.

I've never done it, so I can't say for certain, but in theory it would appear to be remarkably easy. A simple example could be to just rot13 an english post, add/subtract a specific amount of padding from each word, and vary the punctuation slightly. More complex examples could be done in under an hour as well. Granted, I doubt it would ever cause a problem, but it certainly doesn't appear difficult.

Replies are listed 'Best First'.
Re: Re: Re: Non-English posts on Perlmonks
by The Mad Hatter (Priest) on Jul 13, 2003 at 02:57 UTC
    I don't know about you, but to me (and others I know), rot13 is blatantly obvious, and I doubt padding/punctuation would change that. I don't think making up a plausible language within an hour is possible.
      rot13 is blatantly obvious

      Okay, rot14 then ;-P

      Really though, languages follow many simple patterns. What you could do would be to take a couple hundred posts from Perlmonks (or anywhere else) and analyze them for average word length and order. So taking your post I could use the notation 'A' for an alphabetical character and 'P' for punctuation (you could obviously get more specific here) and you get:


      You then average out the structure of the words and create general rules like 'a one-letter word is seldom followed by another one-letter word' and 'this type of punctuation occurs every X letters.' You use these rules to create an acceptable level of variation and then use some random generator to generate numbers in this variation. You then account for certain letters occuring more often than others and assign them accordingly. Dead simple.

      Again though, it's hardly worth worrying about, but it is a neat (SIMPLE) academic exercise.