robots meta tag and x-roborts-tag HTTP response header



 The robots meta command is a small piece of code that instructs web crawlers how to index or crawl web content. This article will introduce the two command methods described in the title. The robots.txt file introduced earlier is used to give web crawlers instructions on how to retrieve (crawl) the website as a whole; the robots meta command introduced in this article is to give web crawlers an index of the web page level (included) ) Settings, and some instructions on how to search page content.

The first command method is the robots meta tag, also known as the robots meta tag, which is called "robots meta tag" in Google's official documents; the other is the x-robots-tag created through a web server HTTP header. The same commands can be used in the two methods, such as "noindex" or "nofollow" and other common commands, but they are slightly different in communication methods and application levels.


One point of special note is that for the robots meta command to take effect, the premise is that the URL must be retrieved. If robots.txt prohibits a webpage from being retrieved by a crawler program of a search engine, the crawler program will not be able to know the commands related to the index. These instructions will have no effect.

In addition, like robots.txt, these commands do not give web crawlers mandatory commands, but only for index-related preferences, so not all crawlers will follow them.

 

Available commands

If no commands are specifically written, the default setting will allow all indexing and retrieval actions.
Commands are not case sensitive, but different search engines may not process commands in the same way. The following description of commands will be based on the Google search engine.

index -tells the search engine to index the webpage, this is the default value, you don’t need to write it out.

noindex - Do not index, that is, do not let the page appear in search results, and do not display cached links.

follow -tells search engines to follow the links on the webpage, which is also the default value, and even if the webpage is not indexed, the crawler should follow the links on the webpage.

nofollow - Do not follow the links on this page.

none -equivalent to using noindex and nofollow commands at the same time.

noarchive -Do not display the cached link of this page in the search results.

nosnippet -Don’t show code snippets in search results (for Google, meta description)

noimageindex -Do not index the images on this page.

notranslate -Do not provide a translation of this page in search results.

unavailable_after: [RFC-850 date/time] -The webpage will no longer be displayed in the search results after the specified time or date. The time or date must use the RFC 850 format.

noodp (obsolete) -Do not use the description in the open directory project to display in search results. The project (DMOZ) website was closed in 2017, so this directive is outdated.

 

meta robots tag

This is the meta robots tag, which is called a robot meta tag in Google’s official documents.
Allows you to individually set how search engines index specific webpages. This tag should be placed in the 
section of the webpage .

The above example instructs all crawlers not to display the webpage in the search results.
You can change the value of the name attribute to the name of the crawler program you want to specify, and the value of content can be modified by instructions.
The example below is a crawler specifically for Google, telling it not to follow all links on this page.


And what if you want to use more than two commands for the same crawler program? For example, if you want to prohibit the index and links on the page at the same time,
you only need to separate each command with a half-shaped comma in the value of the content attribute. The example is as follows:


If you want to specify multiple crawlers individually, you can use multiple robots meta tags, as follows:

For Google, if it encounters overlapping instructions, it will use the most restrictive instruction.

 

x-robots-tag

X-robots-tag can be used as part of the HTTP header response of the specified URL to control the indexing method of the entire page or specific elements.
You can use the same instructions as the meta robots tag, and because x-robots-tag can use regular expressions and does not send instructions through HTML files, it is relatively more flexible.

The following example instructs all crawlers not to index web pages:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noindex
(…)


In the HTTP response, you can use multiple commands, separated by commas, or write multiple X-Robots-Tag headers

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noarchive, unavailable_after: 25 Jun 2010 15:00:00 PST
(…)
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noarchive
X-Robots-Tag: unavailable_after: 25 Jun 2010 15:00:00 PST

(…)


If you want to write instructions for a specific crawler program, the method is as follows:

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: robots: nofollow
X-Robots-Tag: googlebot: nosnippet

(…)


Here are some situations where you might want to give instructions through x-robots-tag:

  • Control the index of content that is not in HTML, such as pictures and movies.
  • Without affecting the index of the entire page, the index of a specific element, such as a movie or picture, is prohibited.
  • Set the index method when the head section in HTML cannot be accessed.
  • Add rules to determine whether a page can be indexed.

 

SEO key reminders for robots meta directives

  • All meta commands need to be received by search engines through the retrieval of web pages. Therefore, for URLs that are forbidden to search through robots.txt, the meta commands in the web page will be ignored. Even if the noindex command is included, the web page may still appear in the search results in.
  • If you do not want the web page to be displayed in the search results, you should use the robots meta command in preference to the robots.txt file.
  • Not all web crawlers follow the robots meta command. Therefore, if there are pages on the website that contain private information and do not want to be publicly searched, it should be protected by a more secure method, such as password protection, so that visitors cannot browse confidential page content.
  • The robots meta tag has the same effect as the x-robots-tag, you can choose one to use.

SEO HTML tag list:

1、1/8 HTML tags SEO - Title tag important HTML tags involved in SEO optimization
2、2/8 HTML tags SEO - Description tag - important HTML tags involved in SEO optimization

3、3/8 HTML tags SEO - Title (H1-H6) tags - important HTML tags involved in SEO optimization
4、4/8 HTML tags SEO - Picture SEO (HTML alt tag SEO) - important HTML tags involved in SEO optimization
5、5/8 HTML tags SEO - Schema markup for SEO - Boost Your SEO by Using Schema Markup

6、6/8 semantic tags for SEO - How to Use HTML5 Semantic Tags to Improve Your SEO?

7、7/8Meta robot tags - How to Use the Meta Robots Tag for SEO



Post a Comment

0 Comments