Meta robots tags and the robots.txt file are essential tools that give you control over how search engines crawl and index your website. Whether you need to block private pages, prevent duplicate content, or optimize crawl budget, understanding how these directives work will help you manage SEO visibility effectively.
The meta robots tag is an HTML element placed in the <head> section of a page. It tells search engines whether and how they should index and follow links on that page.
Example:
This line instructs all search engines not to index the page and not to follow any links on it.
Here are the most frequently used values and what they mean:
|
Value |
Description |
|---|---|
|
index |
Allow indexing of the page (default). |
|
noindex |
Prevent indexing — the page will not appear in search results. |
|
follow |
Allow search engines to follow links on the page (default). |
|
nofollow |
Prevent search engines from following any links on the page. |
|
noarchive |
Prevent displaying a cached version of the page in search results. |
|
nosnippet |
Disable text snippets and previews in search results. |
|
noimageindex |
Prevent images on the page from appearing in Google Images. |
|
notranslate |
Block translation links in search results. |
|
max-snippet:-1 |
No limit on snippet length (you can also set a specific number of characters). |
|
max-image-preview:large |
Allow large image previews in search results. |
|
max-video-preview:-1 |
Allow video previews without limitation. |
You can combine multiple directives:
➡️ This means:
Do not index the page, but do follow the links and do not store a cached copy.
You can specify which crawler should follow your rules. For example, only block Googlebot:
Or only apply rules for Bing:
If you use name="robots", it applies to all crawlers by default.
A classic case — preventing a post-form submission page (like a “Thank you” page) from appearing in search results.
✅ Google and other search engines will not index this page and will not follow its links.
The robots.txt file is a text file placed in your site’s root directory (e.g. https://example.com/robots.txt). It defines which areas of your website crawlers are allowed or disallowed to visit.
Explanation:
Now https://example.com/private-page.html will not be crawled.
You can give different instructions for different bots:
Googlebot will not crawl /test/, while Bingbot can crawl everything.
|
Aspect |
robots.txt |
Meta Robots Tag |
|---|---|---|
|
Location |
Root directory (/robots.txt) |
Inside the HTML <head> |
|
Affects |
Crawling (access to pages) |
Indexing (appearance in results) |
|
Use Case |
Restrict crawling large sections or files |
Control indexing of specific pages |
|
Visibility |
Public |
Page-specific |
|
Can prevent indexing? |
❌ Not always (blocked URLs can still be indexed by URL reference) |
✅ Yes, reliable for removing pages from search results |
If you need to prevent indexing, use the meta robots tag (noindex) — not just robots.txt. If you need to prevent crawling, use robots.txt.
You can combine them for stronger control.
This ensures: