Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: chatterbox & search engines (cloaking)

by Hutta (Scribe)
on Oct 16, 2001 at 20:40 UTC ( [id://119179]=note: print w/replies, xml ) Need Help??


in reply to (ichimunki) Re: chatterbox & search engines
in thread chatterbox & search engines

Watching the UA header, and even maintaining a list of known search engine IP addresses is fairly common in the porn industry to disguise the layout of a page that makes it to the top listing on particular keywords.

It's generally used by people who don't want someone to come along and build a similar page that would have the same high ranking. As such, the UA header is usless with that goal in mind, since it obviously can be spoofed.

Anyway, some search engines despise this practice (called "cloaking"), since some people will get "XXX Hardcore Fuxor Pictures" listed on the keyword "Baseball" or something that equally undermines the effectiveness of the search engine. These engines (and I wish I had a current list) don't care if you have a legitimate reason for cloaking, and will ban you anyway.

More rational search engines (a group I'm sure Google belongs in) will tolerate cloaking that doesn't impact how well they help people find sites that match their keywords.

Check out IP Delivery, a poorly written perl program that sells for an absurd amount of money to cloak pages.

Replies are listed 'Best First'.
Re: Re: chatterbox & search engines (cloaking)
by ichimunki (Priest) on Oct 16, 2001 at 21:39 UTC
    This is not a case of cloaking really, it's an attempt to remove temporal data from a static cache-- the only other ways to do that would be to remove the temporal data for non-registered clients, or to include a no-cache directive in robots.txt. Both are unacceptable since the former means AM can't see chat, and the latter because it makes all of PM non-cached. Since the primary "offender" is Google, I'd simply look at their UA string and give them a different page based on that. But I'm not an EE hacker, so I simply offer this as a "nice to have" to the development team.

    I have to wonder how well cloaking-detection even works without human intervention... you can't simply compare HTML from one GET to the next, the site could be using UA to send tuned HTML, or could have a random feature, or any number of other things resulting in slightly dissimilar HTML results. As such, it would almost have to undergo human review, or some similarity testing that PM, with or without Chatter, would probably pass.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://119179]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-28 07:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found