NoFollow is Not the Only Way to Block Link Crawling
Posted by reviews on Mar 18, 2008
Other Ways to block search engine spiders from crawling
Actually, NoFollow does not “stop” the search engine spiders (robots) from crawling, but instead Google and others simply do not pass “link juice” to the backlink.
| rel=”nofollow” Action | Yahoo! | MSN Search | Ask.com | |
|---|---|---|---|---|
| Follows the link | Yes | Yes | Not proven | Yes |
| Indexes the “linked to” page | No | Yes | No | Yes |
| Shows the existence of the link | Only for a previously indexed page | Yes | No | Yes |
| In SERPs for anchor text | Only for a previously indexed page | Yes | No | Yes |
However, as Geoland points out so well, there are other ways to block the crawling of links. The other methods are much more limiting than rel="nofollow", because at least NoFollow still allows the crawling and indexing.
Normal methods to check for NoFollow will not show you if a link is truly crawlable (including SEOQuake, right-clicking on a link and looking at properties, etc). In other words, “what you see is not always what you get.”
Robots.txt
One method is to use Robots.txt to control the crawling of spiders. Robots.txt is kept in the top level of the domain.
Examples
Blocks all robots
User-agent: * Disallow: /
Blocks all robots from crawling specific directories
User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /administrator/
Meta Elements
The robots attribute is used to control whether or not the search engine spiders are allowed to index a page and whether or not they should follow links from a page. The NOINDEX value prevents a page from being indexed, while the NOFOLLOW prevents links from being crawled.
The robots attribute is supported by all of the major search engines. In addition there are several additional values for the robots meta attribute that are relevant to search engines, such as NOARCHIVE and NOSNIPPET, which are meant to tell search engines what not to do with a web pages content. Meta tags are not the best option to prevent search engines from indexing content of your website. A more reliable and efficient method is the use of the Robots.txt file.
NOINDEX tag tells a search engine not to index a specific page.
NOFOLLOW tag tells a search engine not to follow the links on a specific page.
NOARCHIVE tag tells a search engine not to store a cached copy of your page.
NOSNIPPET tag tells Google not to show a snippet (description) under your a search engine listing, it will also not show a cached link in the search results.
Example
<html> <head> <title>Create Backlinks to My Site</title> <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”> </head>
Javascript
Any link written in javascript is not crawlable by search engine spiders. It will look like a normal link and will not be lined through (SEOQuake addon) or highlighted (Search Status addon)
Dynamic URLs
Some spiders may also avoid crawling any url that has a “?” in it (dynamically produced) in order to avoid spider traps which may then cause the crawler to download an ‘infinite’ number of URLs from a Web site. Thus it would probably prevent links on these pages from being crawled.
I never realized that Google crawls “no follow” links. Been stumbled… great post.
This all so very confusing.
I strongly believe that Google still counts those links for something and more than that, other crawlers count them with or without nofollow. I have many links that Yahoo shows and they were nofollow actually.
Interesting observation. I think the right combination of all the three strategies will ensure a good blocking mechanism. Although from experience I find that, the meta nofollow, noindex is one of the surest ways of doing it.
Cheers!
Mani Karthik
Great post! I thought that Google do not crawl and index sites that are “no follow” tagged.
I knew there is difference between search engines attitude to nofollow tags, but what stikes me most is that so much has been done to stop link spammers, and still they can’t be stopped. Nofollow was doomed to failure, in my opinion, as there is only one way to stop spammers: start moderating posts on your websites! And if you have too many websites and do not have time for that, Google must punish you for your greediness. If you can’t find time to maintain a site, you don’t need it!
my blog is dofollow since now. thanks fo ur information
Yahoo’s search share is increasing. Maybe one day webmasters will prefer Yahoo for it gives value even to nofollow link.