Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Re: Non-English posts on Perlmonks

by Anonymous Monk
on Jul 13, 2003 at 02:32 UTC ( #273714=note: print w/replies, xml ) Need Help??


in reply to Re: Non-English posts on Perlmonks
in thread Non-English posts on Perlmonks

so I wouldn't worry about it too much.

Neither would I. In fact, since I don't run this site, I wouldn't worry about it at all :)

That said, it's all a matter of cost vs. benefit. The probability of any damage actually coming from copyright infringement is next to none, worst case you get a cease and desist letter (and even that has a very low probability).

As for offensive content and falling trees, you never know who might walk along after the tree has fallen. It would reflect poorly on Perlmonks, but the odds are slim and definately wouldn't outweigh the benefits.

it's far from trivial to generate random strings that look like a language but isn't.

I've never done it, so I can't say for certain, but in theory it would appear to be remarkably easy. A simple example could be to just rot13 an english post, add/subtract a specific amount of padding from each word, and vary the punctuation slightly. More complex examples could be done in under an hour as well. Granted, I doubt it would ever cause a problem, but it certainly doesn't appear difficult.

Replies are listed 'Best First'.
Re: Re: Re: Non-English posts on Perlmonks
by The Mad Hatter (Priest) on Jul 13, 2003 at 02:57 UTC
    I don't know about you, but to me (and others I know), rot13 is blatantly obvious, and I doubt padding/punctuation would change that. I don't think making up a plausible language within an hour is possible.
      rot13 is blatantly obvious

      Okay, rot14 then ;-P

      Really though, languages follow many simple patterns. What you could do would be to take a couple hundred posts from Perlmonks (or anywhere else) and analyze them for average word length and order. So taking your post I could use the notation 'A' for an alphabetical character and 'P' for punctuation (you could obviously get more specific here) and you get:

      A AAAPA AAAA AAAAA AAAP AAA AA AA PAAA AAAAAA A AAAAPP AAANN AA AAAAAA +AAA AAAAAAP ...

      You then average out the structure of the words and create general rules like 'a one-letter word is seldom followed by another one-letter word' and 'this type of punctuation occurs every X letters.' You use these rules to create an acceptable level of variation and then use some random generator to generate numbers in this variation. You then account for certain letters occuring more often than others and assign them accordingly. Dead simple.

      Again though, it's hardly worth worrying about, but it is a neat (SIMPLE) academic exercise.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://273714]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2020-02-24 09:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (104 votes). Check out past polls.

    Notices?