Robots.txt file is a cause of Google Penalty

This may sound weird, however it is true and well proven that the use of robots.txt file if abused can lead to a Google penalty. The quote below from Google confirms the above statement of declaration:

If your site contains pages, links, or text that you don’t intend visitors to see, Google considers those links and pages deceptive and may ignore your site. This was obtained from the support page of Google: https://support.google.com/webmasters/answer/40349?hl=en

Now, the question that naturally comes to mind is; does it mean that the use of robots.txt is not recommended by Google, the answer is NO; let us see the following below discussion on why its use can trigger and lead to a cause of Google penalty with proven case studies.

The work of a robots.txt should be fully explained for the proper understanding of this argumentative essay. From Google, A robots.txt file restricts access to your site by search engine robots that crawl the web.

However, the main function of the file is being overused and abused by lots of blog owners all in an attempt to prevent either duplicate content e.t.c.

While Google acknowledges the use of the file, they prefer the use of noindex meta tags as contained on their blog below: https://support.google.com/webmasters/answer/156449?hl=en

To entirely prevent a page’s contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag or x-robots-tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index. The x-robots-tag HTTP header is particularly useful if you wish to limit indexing of non-HTML files like graphics or other kinds of documents.

A Check on their own robots.txt file reveals some stunning truths: Check: http://www.google.com/robots.txt

  • Case Study: WSM4B Website

According to the author and publisher of the above named website, David Mercer; he published the below article on how he recovered from Google panda and one of the things he pointed out was that he had to work on his robots file.

He emphatically stated that the use of Robots.txt on a site is a bad practice and Google hates such; infact, the use of the robots file for pages, posts, links on a site big G has already indexed is a big bad idea and his statement is in line with what big G earlier said.

If a particular page is blocked, then it means it can’t be crawled and such page stays there forever.

What he did to recover, amongst other things was to simply go soft on his robots file, instead of using the file, simply NOINDEX such page, post or even category.

You can check his blog post at: http://smepals.com/seo/step-by-step-guide-google-panda-penalty-recovery

  • Case Study: Yoast website

Yoast is a personality,  a leading SEO brand and Joost de Valk remains an important personality when it comes to wordpress, I can say he is the third most popular person in wordpress blogosphere (after matt cuts-of Google, and matt mullenger-founder of wordpress). He created and released the popular wordpress SEO which is used on almost 90% of self wordpress blogs all over the world. So, following and learning from such a man is a good thing to do.

In one of his posts, he argued that using robots.txt to block some contents on your website is a bad idea, but rather, the use of noindex tag should be used; this same statement coincides with what Google and WSM4B publisher earlier reiterated.

Yoast.com is a PR6 website and ranks 3,060 and 1,071 on the global and USA Alexa rankings.  I later checked his robots file and discovered it’s so simple. See: yoast.com/robots.txt

  • Case Study: Shoutmeloud

Shoutmeloud.com is a site I respect, however, a look at their robots file makes me to wonder and somehow get confused. One thing that is keeping the site  getting good Google organic results was the fact that the owner is damn good at SEO practices and post original content regularly. However, he can do better if he goes gently on the use of the robots file and simply embrace the noindex tag.

Check out: shoutmeloud.com/robots.txt

SEO best practices on Robots.txt file

Another uncommon reason why some blog or website losses traffic and rankings is because the structure of their robots file is wrong. Apart from this causing Google penalty, it is a relatively unknown cause why some people losses their SERP. The example below explains more details:

For yoast websites below and other websites, your robots file should set the user-agent like the one below, this simply means that the search engines ( Google bots, bing e.t.c) can crawl your site. The ‘disallow’ tag simply tells it not to crawl and index that aspect.

User-Agent: *
Disallow: /wp-content/plugins/
Disallow: /out/
Disallow: /bugs/
Disallow: /suggest/
Allow: /wp-content/plugins/vipers-video-quicktags/resources/jw-flv-player/player.swf
-------------
User-Agent: *  --is the correct tag, however, having
User-agent: * Disallow: /   simply means the search engines do not have access to the site.
-------------------
Another scenario, the below tag shows that search engines can crawl
Allow: /
Now the below tag tells search engines not to crawl and index the page
Allow: 
-------------------
A very good resource for better understanding is at MOZ: http://moz.com/learn/seo/robotstxt

Conclusion

While the use of robots file is logically and SEO approved and okay to use, its overuse will hurt your rankings and make Google to decrease your rankings. Use it if there is need, however, make use of noindex tags as possible alternative, the use of NOINDEX tag is the standard and preferred SEO practice.

Also, make sure you check your robots.txt file on your browser to check for  any typographical error.

Interesting and helpful? Share on Google+, and your discussion below will be appreciated

Leave a comment

Your email address will not be published. Required fields are marked *