Perfect robots.txt file for WordPress, blogger [Rules explained]

I have often been modifying the rules of robots.txt to prevent duplicate search queries on Google.

You can monitor by site:yourwebsite.com compare between  posts and search results, after necessary changes resubmit sitemap to Google.
You can also use google parameter tool best way to find the duplicate issues.

how to create a robots.txt file

  1. Create text document using note pad rename to robots.txt. The txt extnesion may hide on mostly computers.
  2. Upload robots.txt to your website root directory .
  3. make sure you can access by yourwebsite.com/robots.txt
  4. generally robots.txt comes with default installation scripts like wordpress, zoomla, blogger also.

what is the use of robots.txt file

  1. The main advantage of robots.txt  to block search engines either entire website or some pages.
  2. To block dynamic urls that are cause duplicate and low quality content issues (blocking login page from google).
  3. In wordpress replytocom and /s?= serach strings causes duplicate content on search results page and posts.
robots.txt file example
User-agent: Mediapartners-Google 
Disallow:
User-agent: *
Allow: /
  • If user agent * means applicable for all search engines like Google,bing,yandex and other.
  • Allow means Allowing crawl, disallow tells search engines not to crawl
  • Here user agent Google media partners like adsense disallowing from crawling content.
If you want to block a directory using robots text then you have mention
disallow: /website directory name/
it applicable for child directories/ urls
Disallow: /*? means url  that contains ? mark anywhere in url blocks by robots.txt 

Most customized robots.txt for WordPress

User-agent: Mediapartners-Google 
Disallow:

User-agent: *
Allow: /

User-agent: *
Disallow: /cgi-bin/
Disallow: /page/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /recommended/
Disallow: /comments/feed/
Disallow: /trackback/
Disallow: /index.php
Disallow: /xmlrpc.php
Disallow: *?wptheme
Disallow: ?comments=*
Disallow: *?replytocom
Disallow: /wp-content/plugins/
Disallow: /20
Disallow: *feed
Disallow: *?no_redirect=true
Disallow: /?
Disallow: /search/
Disallow: /?s=
Disallow: ?wptouch_switch

User-agent: Mediapartners-Google*
Allow: /

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /
User-agent: Googlebot-Mobile
Allow: /
Below are the optional strings depending upon plugin, mobile urls ? if you are getting traffic from twitter, facebook, feedburrner and rss that redirected urls also caches by Google bot to prevent that we have to add ?utm source and medium tags,
wptouch_switch=desktop&redirect
?no_redirect=true
?utm_source
utm_medium
?Action
?utm_campaign=
/?from_index
?wptouch_switch
?redirect
?post_id
?no_redirect
?replytocom

Best robots.txt for blogger

User-agent: *
Disallow: /search
Disallow: /*?
Allow: /
Sitemap: http://www.theonlineking.com/sitemap.xml
In blogger Disallow search helps preventing lables and search max results page crawl by google.
we can block Monthly archive pages by meta tags
Disallow: /*?  helps for blocking mobile redirect urls in blogger like yourwebsite.com/?m=1,
may lose some important mobile urls that already indexed and ranking, but rel canonical already set to desktop version but still google indexing country level domains also.
Also look at. htaccess tutorials (we can do much pretty with .htaccess)