Re^3: Unable to connect
by bliako (Abbot) on Mar 22, 2025 at 20:38 UTC
|
I hate to say it but if this is the case then a human verification is needed WHEN abnormal page visiting patterns (APVP) are detected. Can we check if there is any APVP in the logs around this short time interval mentioned by choroba and others?
I would consider a normal visiting pattern the following: open 5,10 posts from newest nodes (i am sending them to different tabs) on a short burst (e.g. when first landing on newests nodes or RAT). Then inactivity (voting/commenting does not count) while reading or doing other things. I don't think even someone who has not logged in for a year would open hundreds of posts in one short burst to read them all in a ... few days. Perhaps we could be allowed to read only 10 posts/day from the distant past. Of course this entails a cookie for anyone visiting not only for those logged in. Or keeping a track of what each IP (not user) does what and how often. thinking out loud.
| [reply] |
|
Can we check if there is any APVP in the logs...
Good idea. So I just did some log file messin', and found that there have been a huge number of hits on the site from the address range
66.249.64. to
66.249.79.
All of the hits are from Anonymous Monk, and all are submitting queries to Super Search — which, of course, is quite resource intensive.
A lot of the queries are somewhat perl or perlmonks related, but many are not. In fact, it looks a lot like someone trying to use Super Search like google.
Worth noting: No monks are logging in from this address range.
Today's latest and greatest software contains tomorrow's zero day exploits .
| [reply] |
|
Can I also suggest a quality-of-service monitor, say once/twice per minute from outside the site, or running internally but through a proxy? I think Discipulus and/or choroba, possibly other monks, developed something like this. Once the quality drops, record the ips from the logs and add them to the ban-list. It remains to be decided what kind of ban that will be, given all the points you made earlier.
Apropos captcha, perhaps PM can, once more, innovate by using perl-programming-related puzzles suggested by monks. So that AI bots don't leave empty handed but improve a little. After all, the use of Perl by AI would be much more influential for Perl, than by humans who seem to have, in their proverbial ignorance, abandoned ship :(. I can see these captchas soon to be replaced by whole PM SOPW questions that the AI must answer in order to gain temporary access to SuperSearch. Put those bots to real work.
| [reply] |
|
| [reply] [d/l] [select] |
|
All of the hits are from Anonymous Monk, and all are submitting queries to Super Search — which, of course, is quite resource intensive.
Maybe there should be a rate limit for Super Search and other expensive functions? Applied only to Anonymous Monk, so that regular monks can use Super Search as usual. A sane limit might perhaps be something like two or three searches per second per IP address, and no more than perhaps 20 per Minute per IP address. That would still allow a few Anonymous Monks behind a masquerading gateway, but would block robots.
Or, a little bit evil: Require a login for using Super Search. Yes, it's at least annoying for people not wanting to create yet another account. On the other hand, a Google search with site:perlmonks.org works quite well, so Anonymous Monks could use that.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] |
|
|
|
|
|
| [reply] |
|
That also means that the energy consumption by AI is much higher than usually considered, as it also contains most of the energy needed to run all the servers they suck dry.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
|
| [reply] |
Re^3: Unable to connect
by cavac (Prior) on Apr 01, 2025 at 14:47 UTC
|
Unfortunately. I had to tighten the screws on my private server as well. Most of those scrapers are really, really, dumb, too. When encountering a public repository (both git and mercurial), instead of just pulling the repo (a rather efficient operation), they just follow through the web pages and generate every page in every which way. Still working on some smarter rules, but so far i managed to reduce traffic to my server by (very roughly) 90% without affecting most legitimate users.
There are still a few things i want to implement to detect bot activity even better and have to ability to automatically block specific subnets when bot activity is detected from those IP's. But that's all very specific to my private server and unfortunately wont be applicable to the monastery.
| [reply] |