This article will tell you, How to Create and Optimize robots.txt File for Your Site?
What is robots.txt File
robots.txt file is meant to be read by search engine spiders (robots, bots), not by by humans like you and me.
robots.txt indicates parts of site (posts, pages and directories), you don’t want search engine spiders to access.
All pages and images included in Sitemap.xml file get automatically crawled and indexed by spiders.
Meta Tag decides what posts and pages get indexed and or cached by search bots. Meta Tag always super-cedes Sitemap.xml.
A search engine robot will always visit your robots.txt file, before indexing your blog.
A search spider first visits robots.txt, then Sitemap.xml, and ultimately Meta Tag on a particular page.
Take advantage of this, and add location of your XML sitemaps to your robots.txt file.
robots.txt allows blocks access only in the root directory.
By default, anything outside the root directory is blocked by the hosting company.
Best Location to Place robots.txt
Always place robots.txt file at the root directory of your domain.
If your domain is https://kunaldesai.blog/, then your Robots.txt file should be found at https://kunaldesai.blog/robots.txt.
How to Create Robots.txt file for Your Site
You need to manually create and upload robots.txt file for your WordPress blog.
Create a new file in notepad if using Microsoft Windows Operating System. Save it as “robots.txt”. Then upload to root directory of your domain with the help of a FTP client like FileZilla.
Sample WordPress robots.txt File
How to Optimize WordPress robots.txt File
A robots.txt file consists of one or more blocks of directives [or syntax].
You can address a specific spider with the name “user-agent”.
You have two options to achieve this:
1) Use wildcard character (*) for all search engines.
2) Or a specific user-agent for specific search engine.
To Index all Posts/Pages and Directories by all Search Engines
User-agent: *
Disallow:
Here * denotes all search engines and Disallow blocks access to specified posts/pages or directories.
To Address Specific Search Engine using User-Agent
User-agent: Googlebot
Disallow:
To Noindex all Posts/Pages and Directories
User-agent: *
Disallow: /
To Noindex Whole Directory but Allow Specific Page
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Names (user agents) of Search Engine Spiders
- Googlebot
- Googlebot-Image
- Googlebot-Mobile
- Googlebot-News
- Googlebot-Video
- Mediapartners-Google
- AdsBot-Google
- bingbot
How to Validate robots.txt File
One of the best place is robots.txt Tester in Google Search Console.
robots.txt Notes
- robots.txt will not remove a page/post from search engine index.
- Allow search engine to access even low quality pages.
Moral of the Story
robots.txt file tells search engine spiders what post, pages, directories, and media it can crawl and not crawl.
Share the Love
NSE Indices
BSE Indices
US Indices
Global Indices
US Stocks
Popular Blogs
Trending Blogs
- 10 Ways Practicing Gratitude Can Transform Your Life
- Top 34 Inspiring Examples of Good Karma to Improve Your Life
- The Art of Giving: A Guide to Meaningful Donations
- 40 Real-Life Instances of Irresponsible Behavior
- 32 Examples of Individual Rights You Didn’t Know You Had
- How to Change Your WordPress Author Slug and Base