Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

nofollow links in reaped nodes

by mpeg4codec (Pilgrim)
on Nov 29, 2009 at 18:43 UTC ( #810026=monkdiscuss: print w/ replies, xml ) Need Help??

We may be inadvertently increasing internet search ranking for spammers via reaped nodes. Nodes that have been reaped for spam often include links to other sites. Although the links no longer show in the original discussion context, the text of the node, including all links, is one click away ('You may view the original node and the consideration vote tally'). Search bots will crawl this page, follow the link, and index the spammer's site. Since our site is well-ranked on the internet, this will boost the ranking of the spammer's.

We should neuter these links by either adding the nofollow option or converting them to plain text.

Comment on nofollow links in reaped nodes
Re: nofollow links in reaped nodes
by Corion (Pope) on Nov 29, 2009 at 18:56 UTC

    Anonymous Monk (and the web spiders are part of that) cannot visit reaped nodes.

      ++, thanks for this tidbit
      But what they can see is something like the latest nodes:
      Re^2: Reaped: <div title="فاركس"><A href="http://www.***.co.ir/" title +="فاركس"><IMG alt="فاركس بروكر فوركست سيگنال اكسپرت" src="http://www. +***.co.ir/***.gif"></A> <A href="http://www.***.com/" title="فاركس">< +IMG alt="فاركس بروكر ف
      Those appear on the Newest Nodes page. And they can be seen easily by anyone, i.e. spiders, too. Or do I miss something?

      So maybe reaped nodes titles should be updated if necessary?
      Update
      Someone just did exactly what I wrote while I wrote this node... ;)
Re: nofollow links in reaped nodes
by ww (Bishop) on Nov 30, 2009 at 02:26 UTC

    The <a href="..." rel="nofollow">...</a> may not do what you suggest. The authors' spec ( http://microformats.org/wiki/rel-nofollow ) states in the Abstract:

    .... By adding rel="nofollow" to a hyperlink, a page indicates that the destination of that hyperlink SHOULD NOT be afforded any additional weight or ranking by user agents which perform link analysis upon web pages (e.g. search engines). Typical use cases include links created by 3rd party commenters on blogs, or links the author wishes to point to, but avoid endorsing.

    WP puts it this way (apparently correctly):

    The nofollow attribute value is not meant for blocking access to content, or for preventing content to be indexed by search engines.

    It becomes clear, as the article continues, that the SE implementation is crucial:

    While some (search engines) take it literally and do not follow the link to the page being linked tocitation needed, others still "follow" the link to find new web pages for indexing. In the latter case rel="nofollow" actually tells a search engine "Don't score this link" rather than "Don't follow this link." This differs from the meaning of nofollow as used within a robots meta tag, which does tell a search engine: "Do not follow any of the hyperlinks in the body of this document.".

    And, just BTW, before it comes up, only well-behaved robots honor robots.txt.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: monkdiscuss [id://810026]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-07-29 05:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls