This article will tell you, How to Create and Optimize robots.txt File for Your Site?
What is robots.txt File
robots.txt file is meant to be read by search engine spiders (robots, bots), not by by humans like you and me.
robots.txt indicates parts of site (posts, pages and directories), you don’t want search engine spiders to access.
All pages and images included in Sitemap.xml file get automatically crawled and indexed by spiders.
Meta Tag decides what posts and pages get indexed and or cached by search bots. Meta Tag always super-cedes Sitemap.xml.
A search engine robot will always visit your robots.txt file, before indexing your blog.
A search spider first visits robots.txt, then Sitemap.xml, and ultimately Meta Tag on the page.
Take advantage of this, and add location of your XML sitemaps your robots.txt file.
robots.txt allows blocks access only in the root directory.
By default, anything outside the root directory is blocked by the hosting company.
Best Location to Place robots.txt
Always place robots.txt file at the root directory of your domain.
If your domain is https://kunaldesai.blog/, then your Robots.txt file should be found at https://kunaldesai.blog/robots.txt.
How to Create Robots.txt file for Your Site
You need to create and upload robots.txt file for WordPress blog.
Create a new file in notepad if using Microsoft Windows Operating System. Save it as “robots.txt”. Then upload to root directory of your domain with the help of a FTP client like FileZilla.
Sample WordPress robots.txt File
How to Optimize WordPress robots.txt File
A robots.txt file consists of one or more blocks of directives [or syntax].
You can address a specific spider with the name “user-agent”.
You have two options to achieve this:
1) Use wildcard character (*) for all search engines.
2) Or a specific user-agent for specific search engine.
To Index all Posts/Pages and Directories by all Search Engines
Here * denotes all search engines and Disallow blocks access to specified posts/pages or directories.
To Address Specific Search Engine using User-Agent
To Noindex all Posts/Pages and Directories
To Noindex Whole Directory but Allow Specific Page
Names (user agents) of Search Engine Spiders
How to Validate robots.txt File
One of the best place is robots.txt Tester in Google Search Console.
- robots.txt will not remove a page/post from search engine index.
- Allow search engine to access even low quality pages.
Moral of the Story
robots.txt file tells search engine spiders what post, pages, directories, and media it can crawl and not crawl.