Skip to main content

How to Remove Duplicate Content From Your Blogger Blog Posts to Avoid SERP Penalty

Duplicate content to an extent may not affect your blog’s search engine rankings. However, there are quite a few times when it can go out of control and start to hit your rankings badly, even without your knowledge. Here are ways to curb it.

What Is Duplicate Content

Do you have a blog in which you post regularly? Do any two different URLs in that blog have the same content? Then it is duplication. In case of self-hosted blogs, various features like print preview pages, monthly archive pages, category pages, etc., can cause duplicate content. In such cases, normally search engines rank one of the pages lower. However, in extreme cases, when your blog has a number of pages with the same content, the blog can be penalized.

Google puts it:
In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.
Which indicates how serious duplicate content can be.

How It Comes to Play?

Duplicate content in Blogger blogs can be the result of having different URLs point to the same content. Recently, I found that this was happening with the Recent Comments widget in Blogger. If you enable it in the sidebar, the new comments will receive a special URL in this form: http://cutewriting.blogspot.com/2008/07/story-of-theme-redesign-and-blogger-w3c.html?showComment=1224840480000#c7659187557945906929

This ‘showComment’ link may get indexed by Google and it causes a duplicate content of the original blog post URL (which is only up to "w3c.html"). This can be serious in case of blog posts with many comments, as all of these comments will be indexed as separate URLs by Google.

Another case with duplicate content is with monthly archive pages. These have all posts made in a particular month. In Blogger, these archive pages are not disallowed with a Robots.txt directive. They get indexed by search engines and causes duplicate content problems.

The categories (labels) pages on Blogger do not cause duplicate content penalty, as they are already disallowed through the Robots.txt file in Blogger. You can find this in Webmaster Tools.

How to Fight Duplicate Content

All you have to do to find out duplicate content is this:

Go to Google, search for site:your URL (without space after the colon).
Now, see how many URLs are indexed by Google. If it is more than the number of posts you have published, then there is duplication for sure.
Send a removal request at Google Webmaster Tools for all unnecessary URLs (like the showComment URL above). Don’t request to remove any normal post URL.

Removal Request Screenshots

Before requesting removal, make sure you meet the criteria here. Here are the screenshots of requesting removal of URL from Webmaster Tools. Click to enlarge.

1. The Red colored link is the duplicate comment link
Google showing duplicate Blogger comment link
2. Starting a removal request

On clicking the New Removal Request button from Google Webmaster Tools->Tools->Remove URLs, you will see this window.
Starting a removal request at Google Webmaster Tools
3. Put in the URL and add it
Google webmaster Tools URL removal request adding URLs
4. Once done, submit the request
Google Webmaster Tools URL removal request submitted

When using the Recent Comments widget, take this precaution.

  • Go to Layout->Edit HTML
  • Search for “Recent Comments” or whichever title you have given to the recent comments widget.
  • Look for expr:href='data:i.alternate.href' after that, add the rel="nofollow" attribute as in the image.
  • Once done, save the template. Now the recent comments widget will be automatically nofollowed.
Recent Comments widget nofollowing

[A very important update to this post: Adding Nofollow to post page timestamp]

However, make sure these URLs are not linked to from anywhere on the web. That will cause it get indexed and noticed by search engines.

We cannot edit the Blogger Robots.txt file. So, the monthly archive pages may be automatically included in the index. To prevent this, make sure you don’t link the monthly archive page from anywhere on your blog. If you find any links to this page, try to request the person to remove it or put nofollow attribute to it. If you are self-hosting your blog, disallow all duplicate pages from search bots through a Robots.txt directive.

Look at my sidebar, where each month’s archives are shown as a monthly post recap page (for better indexing), which is a post page with links to that month’s all posts.

If you have a design like my WP Premium here, the monthly archives are placed within a JavaScript widget on the sidebar, which will not be found by search engines. Just make sure, however, that nobody links to these archive pages. If you find they are still indexed, try to request a removal at Webmaster Tools.

Copyright © Lenin Nair 2008

Comments

  1. It is a nice post and i did the comment no follow right now.
    Thank you for the tip.

    I had a question.

    If we show some important links on the side bar like recent posts or important posts will it be a duplication issue ?
    Request you to clarity ?

    ReplyDelete
  2. Hi, Suresh, thanks for the comment.

    You can of course show your related posts or featured posts on the sidebar. That will not be counted as duplicate content. Make just sure that you don't link to any comment link or the monthly archive page link. If you want, you can always make these links NoFollow.

    Lenin

    ReplyDelete
  3. Excellent article! Duplicate content issues are very important to avoid in terms of on-page SEO.

    ReplyDelete
  4. Thanks, Barry. I had recently commented on your blog. You got a great resource as well.

    ReplyDelete
  5. Hi Lenin, thanks for the tip! Just did it on my blog. Time to remove all that duplicate content from search engines! =)

    Another useful tip: find the code (b:if cond='data:blog.pageType == "archive"') on the top of your template and add a meta robots noindex right below it. That will prevent search engines from indexing your archives. Hope it helps! ;)

    ReplyDelete
  6. Hi presidente, thanks for the comment. definitely your tip looks like workable. Thanks for it.

    ReplyDelete
  7. The request for deleting urls from google using webmaster tools have failed
    It sayed there is third party owner ( blogspot) and urls can be moved only by it !!

    what to do
    I have about 430 duplicate url because I have comments

    ReplyDelete
  8. Blogger URLs can't be removed as you can't edit robots.txt file or robots meta tag. You can remove comments right?

    ReplyDelete
  9. I really enjoyed to read your blog because you have content what I expected here. thanks

    ReplyDelete
  10. This post really helped me alot,
    I was worry about the tons of duplicate content on my blogs,
    But now I can relax and my peace of mind is back.
    thanks

    ReplyDelete
  11. Thankyou for this nice post, very useful, but now i have a problem.
    i have a duplicate html?commentPage=2 can you help me please ?

    ReplyDelete

Post a Comment

Comments are moderated very strictly

Popular posts from this blog

What Is the Difference Between Hardcover and Paperback?

Today, my reader, Rahman contacted me with a doubt:

Dear Lenin, would you explain why there are two types of books: hardcover and paperback?
This is quite a simple affair and there are explanatory articles to be found at various places on the Net. Here is my addition.

Hardcover

A hardcover aka hardback is a book bound with thick protective cover, with usually a paper or leather dust jacket over the main cover. The aim of hardcover is protection and durability. These books are mainly for long-term use and collectors’ editions. Hardcover books last far longer than the corresponding paperbacks. They do not get damaged easily thus making them perfect for reference guides, great literary works, etc.

In addition, there is a difference in the type of paper used to print hardcover books. The paper used is long-lasting acid-free type. Acid-free paper has a pH value of 7 (neutral) which makes it highly durable. The papers are stitched and glued to the spine.

Hardbacks are prepared for commercial …

En Dash, Em Dash, and Hyphen

We have three types of dashes in use: The hyphen, En Dash, and the Em Dash. In this post, we will see how to use them all correctly.

Hyphen (-)

The hyphen is the minus key in Windows-based keyboards. This is a widely used punctuation mark. Hyphen should not be mistaken for a dash. Dash is different and has different function than a hyphen.

A hyphen is used to separate the words in a compound adjective, verb, or adverb. For instance:

The T-rex has a movement-based vision.
My blog is blogger-powered.
John’s idea was pooh-poohed.


The hyphen can be used generally for all kinds of wordbreaks.

En Dash (–)

En Dash gets its name from its length. It is one ‘N’ long (En is a typographical unit that is almost as wide as 'N'). En Dash is used to express a range of values or a distance:

People of age 55–80 are more prone to hypertension.
Delhi–Sidney flight was late by three hours.


In MS Word, you can put an En Dash either from the menu, clicking Insert->Symbol or by the key-combination, Ctrl + Num…

What Is the Meaning of the Word 'Ghajini'? Story and Trivia of Aamir Khan's New Film [Special]

[Special Entry]



Aamir Khan's latest film is titled a little weirdly for the taste of Hindi filmgoers. 'Ghajini': They have never heard of such a name, and such a word never existed in Hindi or in any other Indian language.

The name Ghajini is the name of the villain of the film. In Tamil version, the name of the villain was Laxman.

As a Tamil moviegoer, I have already watched Ghajini and know the story in full.

So, What Does the Title Mean?

In Tamil, the title of the film is inspired by the story of Mahmud of Ghazni, an ancient invader of India. This person was so persistent in invading India that he continued trying after several failures. In the film too, the protagonist is such persistent in finding out and killing the villain of the film, who had killed his girlfriend, Kalpana (played by Asin). Aamir's Character (named Sanjay Ramaswamy in Tamil), is a short-term amnesiac, who cannot remember anything more than fifteen minutes.

You may ask then how the Ghazni became…