There can be several reasons you may want to noindex
a web page/URL. There may be password protected pages which you don’t want to be seen in search results or pages where you want that only your web admin should have access or affiliate URLs redirecting to another webpage.
First, what do we mean by noindex?
noindex
simply means that you don’t want a web page or a link to show up on search engine results pages.
Most of the times it can be accomplished by setting a meta tag on the web page. WordPress users have this covered when they use plugins like Yoast SEO or All-In-One-SEO which provide capability to do so on a per page/post basis.
Here is the syntax used before the closing </head>
tag in a web page using meta robots tag.
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW">
How to use htaccess file to noindex a URL?
For one of my websites, I run affiliate programs for others to participate. The issue was that the affiliate URLs were getting indexed in Google search results. The affiliate URL in my case had the structure –
http://www.example.com/aff/a/?a=786&p=mysite.com/product1/
As can be seen above, there is no way to implement meta tag directly in the page for this kind of URL. Fortunately, Google supports another not-widely-known method to noindex
a page using X-Robots-Tag
directive in the HTTP Header
.
This is normally done in Apache config files but most of the regular webmasters host their websites on shared web hosts. Shared web hosts rarely provide access to Apache config files, however you can still create or access htaccess
file in the respective directories.
htaccess
file rules are applied recursively which means that in the current example, htaccess
file residing in /a/ folder will supersede htaccess
file residing in /aff/ folder which will in turn supersede htaccess
file in the domain root directory. In short, if similar a rule resides in /a/ folder and /aff/ folder , the rule in /a/ folder will take precedence.
To noindex
any URL containing the structure shown above, I will go to /a/ folder, download the htaccess
file (or will create a new file in case it doesn’t exist) and add this at the top of the file.
Header set X-Robots-Tag "noindex, follow"
We are asking search engines to not index the URL but follow the links.
We will use HTTP Header checker tool to verify the implementation. This is the response we get.
As can be seen in the rectangle marked above, header shows the response as X-Robots-Tag => noindex, follow
Over To You?
Have you faced such a situation before? How did you resolve the issue?
I have been looking for videos/resources with regards to x robot tags with no luck for people have made any videos about that (or seen any good resources a non-techy would understand).
I found this from google where they show parts of what I want to see: https://developers.google.com/search/docs/crawling-indexing/robots-meta-tag
So I want to see how I can noindex/nofollow none html pages, such this case with pdf files.
Header set X-Robots-Tag “noindex, nofollow”
But with the example above, where do I place the directive in the htaccess file ??
But I also want to know how I can use these directives to nofollow/noindex towards the spammy “search urls in my GSC”
EX:
domain.com/search/online.vi***a.domain
Which is coming up heavily in my search console.
And I would want to show a directive to not crawl those (or noindex/nofollow them).
Also, in this case with regards to the spam medic search urls such as the one above.
Where does the directives go in the htaccess file ??
Thanks,
nice.
Hi friend,
how can I block an specific URL to not being indexed?
This URL is doing an Ajax petition and is appearing in google search I want to prevent it.
Thanks for your time
@Jordi: Just add this before the closing
head
tag.META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"
where to add url which I want to stop from indexing