What is duplicate content and 8 ways to avoid it

When Google panda was launched in February 24th, 2011, it sent ripples of shock to the entire SEO and blogging community as many websites and blogs were knocked off the top ranking pages;

Recommended to read: How to be on first page of Google without backlinks

( Google panda, panda 2.0 and its subsequent updates dealt with duplicate content and websites with low quality, poorly churned out contents); since that time up till now, duplicate content issues has been taken more serious as an SEO best practise parameter.

In the context of our discussion, the following definition to the meaning of what a duplicate content means is explained:

What is a duplicate content according to Google?

Matt Cutts ( Head of webspam at Google) and his team gave this definition; ‘ Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin

I still believe that Google penalizes site based on duplicate content, despite the Google Hummingbird approach and what Matt Cutts said in the video below ( in July 2013):

So? Duplicate content is not good and hence will take down your blog’s ranking because Google will penalize you; however, the type of issues big G is really after is:

Making a manipulative attempt to make your site appear to contain more content ( publishing the same or nearly the same topic in various content posts on your blog!! this includes stuffing with keywords, repetitive words e.t.c

However, the following ways to avoid a duplicate content issues are discusssed below:

  • Be careful with content Syndication

Google explicitly made it clear that content syndication is not bad, however, if you syndicate your content on other blogs ( bookmarking, social media sites, content farms e.t.c.), Google will just show the version they think is most appropriate for the users, and this  means they will rank higher the version that they think is okay by their search engine metrics. To me, this is determined by the domain and site authority of your own blog and/or the sites you are syndicating to!!

What is Content syndication? this is simply the act or practise of sharing your blog content on various blogs with an aim of getting backlinks to your site. Content syndication is mostly done with the application of feeds and RSS.

Case study: According to David Mercer, he explained how content syndication to high authority site made Google to slap him with Google panda penalty. I highly encourage you to check his experience and how he recovered:

http://smepals.com/small-business-seo/how-content-syndication-can-be-devastating-small-business-seo-and-website-traffic

So while you are busy sharing and syndicating your content everywhere and on triberr ( triberr is good, but do you know that they actually allow Google bot to index a full page of your syndicated content i.e. your 600 published word content is the same thing published on triberr, and they allow Google bot to crawl it!!!! hnmmmm….I use triberr also, but this is a shock discovery that I just realized; to this effect, Biz Sugar is better than triberr

  • Check for duplicate content within your website or blog posts

This is highly common with big websites/blogs having 100s and even thousands of blog posts. In most cases, an article with the same title and same concept might have been published at different times on the blog, and this act is clearly an act of duplication of content.

The example above clearly shows duplicate content within your website or blog posts, the above clearly reveals a duplication of the same concept and content; beware of this error. The solution here is to make sure that before you want to published a content, make sure you searched within blog posts whether you have published that same content or nearly the same content

  • Pay attention to similar content

If you run a content management system blog or website like mine, in which case, any content you post appears on your homepage, archive and then on the main page; you got to be careful and minimize this similar content scenario. Allow  me to explain more better

  • I published a 1,100 worded content post
  • A summary of the post appears on the homepage
  • A summary of the post also appears on the archive pages

However, the summary on the homepage should not exceed 150 words ( standard), but some websites allows up to even 200 and above and some even allowed for the full post. This is absolutely wrong and this is duplicate content by this nature. The solution to the summary in the archive page is to noindex the page ( You can use Yoast wordpress SEO plugin) for this or a standard SEO optimized wordpress theme from Genesis and Elegant themes.

In addition, according to Google, if you  have more than a page or post on a similar topic, it is highly advisable to merge them together to a single page or post.

  • Pay careful attention to Meta description

I was once a victim of this, and thanks to Google webmaster tools that made me realize this; allow me to explain in details.

Most modern, well optimized wordpress theme comes with a meta description box or slot for your homepage, posts, categories and pages. So, this means you can give individual meta description to your posts. However, if by mistake, you now installed wordpress SEO plugin by yoast ( which by default also has the same option), you might be duplicating your meta description and this is clearly a duplicate content problem.

My suggestion is this, if you are using Genesis theme, forget about updating your meta description from the wordpress SEO plugin as it is already incorporated.

I highly recommend you verify your site in Google webmaster, and check the health of your blog under HTML description ( this shows you errors that are usually caused by duplicate content).

  • Duplicated word post from guest bloggers

I discussed how Bimple  suffered a Google penalty due to plagiarism ( a guest post was basically copied from elsewhere); this is a word of caution to most people accepting guest posts, make sure you verify the authenticity of the content post before publishing. You can use Copyscape or read my content on free alternatives to copyscape to check for plagiarism and duplicated content.

One of the most essential requirement to rank very well on Google is to have original and quality content; content scrapers and acts of plagiarism are hell bent on destroying reputable blogs and websites, so it is up to you to guard your blog contents with all sense of seriousness.

  • Inform Google how they should index your site

To avoid duplicate content, Google clearly instructed webmasters on the need to inform it on how they should index their website. The internet giant made it categorically clear that, upon verification on their blog, a webmaster, blogger or publisher should:

  • Chose whether Google should index their site like this: http://www.bloggingconsult.org or like this: http://bloggingconsult.org

This should be followed by a crystal clear instruction from your own wordpress dashboard settings on how search engines should index it; your wordpress settings should be the same with the settings you chose on your Google webmaster tools.

  • Make use of Canonical URL settings for post, pages and your website in general

The term canonical is a technical term that only SEO professionals only understand, but for the purpose of this blog post,  I will simply explain all in few steps below:

You have two blog posts:

  • http://bloggingconsult.org/how-to-get-links-build-create/ and also
  • http://www.bloggingconsult.org/how-to-get-links-build-create/

Naturally, they don’t mean the same thing going by the spelling of the words, but technically they do, and it does lead to duplicate content; so what Google does is to pick the one that is mostly used, and that is where the concept of canonical URL comes in.

So, how can I set up a canonical URL for my wordpress blog or website Homepage?

Two steps are needed and it has been done above:

  • Set your preferred domain in Google webmaster i.e. domain name that Google should honor and index; it is already explained above
  • Set your preferred domain from your wordpress settings dashboard, again, the explanation has been given above

To set up canonical URL for your posts and pages, just install wordpress SEO plugin from Yoast, and the plugin will solve all the problems for you.

  • Have a consistent Internal Linking structure

Google gave this simple advise and that is to be consistent with our internal linking strategy. An example on this explanation is given below:

Don’t link to http://bloggingconsult.org/page/ , http://www.bloggingconsult.org/page, http://www.bloggingconsult.org/page/index.htm.

Just stick to one particular format; this practice is not good, and causes confusion to the search engine bots.

 Resources for more reading:

Feedback and Suggestions

Knowledge is not limited, therefore, if you know of additional tips and information, kindly share in the comment section below. Thank you

Leave a comment

Your email address will not be published. Required fields are marked *