Skip to main content

Backlink Analysis Part I: Robots.txt Can Invalidate Your Backlinks

You work hard, contact so many professional A list bloggers out there, and suggest your post for incoming link. If they provide you with a backlink, you will be happy. A high PageRank, unpaid link, given with the free will of the giver, from the most relevant page/category is going to be a million dollar vote to your page. It can itself get your page to skyrocket from SERP 400 to SERP 3.

Yesterday, I had a discussion in the Digital Point Forums about backlinks and their validity. It seems that most people are not knowledgeable about backlink validity analysis. People know only about DoFollow and NoFollow, and nothing above that. Here, we will see the importance of ‘robots.txt’ file in the backlink analysis. Robots.txt is a simple link invalidation secret several professional bloggers won’t share with you.

What Is Robots.txt

When a search crawler accesses a website for searching and indexing, the first thing it looks for is the Robots.txt file. If it doesn’t find one, it goes about normally indexing the page.

A robots.txt file is a small text file on the root directory of the domain of a web page that directs search engine crawlers as to which of the pages or sections of the website shouldn’t be indexed. Its general format is thus:

User-agent: Googlebot
Disallow: /links.html


The above code simply disallows the Google bot from accessing the links page of a website (whose relative path is /links.html). This means, if you are an advertiser and you exchanged links with this person, who has disallowed the links page, your backlink from that page holds no value whatever. But it will definitely look like a DoFollow backlink to the untrained eyes. The search bots will normally index any pages not mentioned in the Robots.txt file.

User-agent: *
Disallow: /page/categories.htm


This code disallows the page for all search bots, not only Google’s. The wildcard, ‘*’ is used to specify that all search bots are to follow the rule. So, none of the search bots will index the mentioned page.

A careful, clever advertiser thinks from the search bot point of view and first looks for the Robots.txt file of the pages he plans to purchase links from. He merely won’t purchase backlink from any disallowed page.

Checking Robots.txt File of Any Website

You can easily check the Robots.txt file of any website out there. It’s right there in the root directory of the domain. It is named that way, ‘robots.txt’. You may just access it through the browser. For instance, if you need to access the Robots.txt file of Microsoft.com, just go to the main home page of Microsoft, and put this on the address bar:

“http://www.microsoft.com/robots.txt”

Google: “www.google.com/robots.txt”

Simple? You will notice that these sites have disallowed a lot of internal pages from being indexed. This is why these internal backlinks do not show up in search results or hold any weight.

Through robots.txt file, you can even specify the sitemap of a website. If you check my robots.txt file, “http://cutewriting.blogspot.com/robots.txt”, you will see that a sitemap has been specified.

Conclusion

Before you purchase links from any website for SEO (if you are purchasing at all) or going for link exchange, first look for the robots.txt file and see if your links are going to get any weight at all. If not, that link exchange simply won’t work.

By the way, this article or this blog does not recommend purchasing backlinks for the purpose of SEO. If you are purchasing backlinks, it should be for traffic and the links should be Nofollow. Purchasing backlinks for SEO can get your site banned by Google easily.

In the next article of Link Analysis series, we will see another important thing to check for before exchanging links: The Robots Meta tag. Subscribe and enjoy.

Related Entries:

What is DoFollow? What is NoFollow (Make Your Blog DoFollow)
Ten Effective Link Building Techniques
Eleven Nasty Ways to Build Links

Copyright © Lenin Nair 2008

Comments

  1. Hai Lenin,

    Being a blogger can we do any thing with robot.text file ?

    Is there any control for blogger on this file ?

    Let me know.

    ReplyDelete
  2. Suresh, Unless you are using Blogger self-hosting, there is no control over Robots.txt. Thanks for commenting.

    ReplyDelete
  3. thanks for sending me a note about this post! i really learned something new today!

    ReplyDelete
  4. What a readable, informative and educating post - always great to learn something new - thanks for sharing your knowledge!

    ReplyDelete
  5. This is certainly an informative article. I "dugg" it!!!

    ReplyDelete
  6. This morning I was researching robot.text file. The Google Tools, etc. website has lots of great info on this topic. Later I discovered this outstanding blog and it has helped me immensely.

    I'm concerned about an odd backlink to some of my youtube videos. It's some kind of fake website about insurance and it has 2 thumbnails of my videos that have nothing to do with insurance. When I clicked the thumbnails I was brought to 2 videos that were not mine. In fact they belonged to 2 competitors in my field.

    For some of my vids this is the only backlink, one backlink. I cannot seem to get my legitimate backlinks to connect to these vids. Do you think someone is screwing with robot.text files to block real backlinks from my vids? Why are these people using my thumbnails on a fake insurance website backlink to direct the next link to their videos?

    I am looking at your blog to see how I can subscribe to this excellent content. Thank you, you have made my day!

    Regards,
    Chuck

    ReplyDelete
  7. Chuck, no I don think someone may be doing anything with robots.txt. It's only one backlink right? You can try building more links to your vids. Good to know you liked the resource and decided to subscribe. Since they are youtube videos, no one has access to the robots.txt file within youtube, so don't worry. I don't get why you say you can't connect your legitimate links to these videos. Can you give specifics so that I can help?

    Lenin

    ReplyDelete
  8. thank's for writing an essential seo article, somthing which I, and probably other people have completely forgotten about; eg robots.txt file.

    ReplyDelete
  9. Thank you for explaining that. I am using a program for my website and it comes with a pre-made robots txt page and I wasn't sure what it meant. Now I know that it isn't allowing certain pages to be indexed. Fascinating. Thanks again.

    ReplyDelete

Post a Comment

Comments are moderated very strictly

Popular posts from this blog

What Is the Difference Between Hardcover and Paperback?

Today, my reader, Rahman contacted me with a doubt:

Dear Lenin, would you explain why there are two types of books: hardcover and paperback?
This is quite a simple affair and there are explanatory articles to be found at various places on the Net. Here is my addition.

Hardcover

A hardcover aka hardback is a book bound with thick protective cover, with usually a paper or leather dust jacket over the main cover. The aim of hardcover is protection and durability. These books are mainly for long-term use and collectors’ editions. Hardcover books last far longer than the corresponding paperbacks. They do not get damaged easily thus making them perfect for reference guides, great literary works, etc.

In addition, there is a difference in the type of paper used to print hardcover books. The paper used is long-lasting acid-free type. Acid-free paper has a pH value of 7 (neutral) which makes it highly durable. The papers are stitched and glued to the spine.

Hardbacks are prepared for commercial …

En Dash, Em Dash, and Hyphen

We have three types of dashes in use: The hyphen, En Dash, and the Em Dash. In this post, we will see how to use them all correctly.

Hyphen (-)

The hyphen is the minus key in Windows-based keyboards. This is a widely used punctuation mark. Hyphen should not be mistaken for a dash. Dash is different and has different function than a hyphen.

A hyphen is used to separate the words in a compound adjective, verb, or adverb. For instance:

The T-rex has a movement-based vision.
My blog is blogger-powered.
John’s idea was pooh-poohed.


The hyphen can be used generally for all kinds of wordbreaks.

En Dash (–)

En Dash gets its name from its length. It is one ‘N’ long (En is a typographical unit that is almost as wide as 'N'). En Dash is used to express a range of values or a distance:

People of age 55–80 are more prone to hypertension.
Delhi–Sidney flight was late by three hours.


In MS Word, you can put an En Dash either from the menu, clicking Insert->Symbol or by the key-combination, Ctrl + Num…

What Is the Meaning of the Word 'Ghajini'? Story and Trivia of Aamir Khan's New Film [Special]

[Special Entry]



Aamir Khan's latest film is titled a little weirdly for the taste of Hindi filmgoers. 'Ghajini': They have never heard of such a name, and such a word never existed in Hindi or in any other Indian language.

The name Ghajini is the name of the villain of the film. In Tamil version, the name of the villain was Laxman.

As a Tamil moviegoer, I have already watched Ghajini and know the story in full.

So, What Does the Title Mean?

In Tamil, the title of the film is inspired by the story of Mahmud of Ghazni, an ancient invader of India. This person was so persistent in invading India that he continued trying after several failures. In the film too, the protagonist is such persistent in finding out and killing the villain of the film, who had killed his girlfriend, Kalpana (played by Asin). Aamir's Character (named Sanjay Ramaswamy in Tamil), is a short-term amnesiac, who cannot remember anything more than fifteen minutes.

You may ask then how the Ghazni became…