If you know a little about SEO then you might have heard of the robots.txt file. It can be a useful tool for anyone who has a website that hopes to attract visitors through search engines. The robots.txt file can essentially tell search engines where they can search on your website, meaning that more time is spent searching the useful pages.
The robots.txt file is also known as the “Robots Exclusion Protocol.” Some people put their ‘noindex’’ rule inside their robots.txt file, but this is all about to change as Google has announced that from September 1st, 2019 they will no longer support robots.txt files which have a noindex directive which is listed in the file.
They are changing their rules to try to keep the eco-system as healthy as possible and to make them as best prepared for any future open source releases. This is a complex issue and one which business who have websites need to respond to. And to enable them to do this, it is important to fully understand what the robots.txt file is and its role in the SEO space.
The robots.txt File in Action
Search engines like Google have web crawlers – known as spiders – which look at millions of websites every day, reading information which helps them to decide which websites they will put at the top of their search results.
If there are pages that you don’t want Google to crawl, you can put a robots.txt file in place. This can be dangerous however, as you can accidentally prevent Google from crawling your entire site. So make sure you pay attention to what you are blocking when it comes to adding these files.
Putting these files in place will also prevent any links on that page from being followed.
It is important to note, however, that although you are blocking the robots, the page can still be indexed by Google. This means that they will still appear in the search results – although without any details. If you don’t want your page to be indexed on Google, you must use the ‘noindex’ function.
Why is it Important?
The main reason why robots.txt files are important and useful to a website is that it can stop the search engine from wasting their resources on pages that won’t give accurate results, leaving extra capability to search the pages displaying the most useful information.
They can also be useful for blocking non-public pages on your website – pages such as log-in pages or a staging version of a page.
They can also be used to prevent the indexing of resources such as multimedia resources like images and PDFs.
The robotos.txt file can also be useful if you have duplicate pages on your website. Without using the file, you might find that the search engine comes up with duplicate results, which can be harmful to your website’s SEO.
The Pros of Using the robots.txt File
Most crawlers will have a pre-determined number of pages that it can crawl, or at least a certain amount of resource that it can spend on each website. This is why it is important to be able to block certain sections of your website from being crawled, allowing the robots to spend their ‘allowance’ only on sections which are useful.
There are some instances when using robots.txt files can be useful, including:
- Preventing duplicate content
- Keeping sections of your website for private use only
- Preventing certain files on your website from being indexed e.g. images and PDFs
- Specifying crawl delays to prevent your servers being overloaded
The Cons of Using the robots.txt File
Blocking your page with a robots.txt file won’t stop it from being indexed by the search engines. This means that it won’t actually be removed from the search results. The ‘noindex’ tag is what is important in preventing the search engine from indexing your page, but remember, if you have used the robots.txt file, the robots won’t actually see the ‘noindex’ tag, and therefore it won’t be effective.
Another potential issue with the robots.txt file is that if you block the robot, all of your links on this page will automatically be valueless to you in terms of flowing from one page or site to another.
There are a number of consequences for brands of Google’s decision to no longer support robots.txt files with the ‘noindex’ directive.
The main concern is that you will need to make sure that if you have used the robots.txt file previously, that you have an alternative to fall back on.
If you are using ‘noindex’ in a robots.txt file, you should look for alternatives and set these in ace before the deadline of 1st September. These include:
- Noindex in meta tags
- 404 and 410 HTTP status codes
- Use password protection to hide a page from search engines
- Disallow in robots.txt
- Search Console Remove URL tool
Here at Fibre, we know that, the majority (91%) of people who use a search engine use Google and this is why it is important to stay up to date with any changes in their policies. By being able to understand and act on their directives, you can ensure that your website is found and will continue to be found by search engines as other rules change.