Web Development Blog — Coding, SEO, Domains & CMS Insights

Control Search Engine Indexing with Meta Robots Tags and robots.txt

Written by Niko Yankovsky | October 15, 2025

Meta robots tags and the robots.txt file are essential tools that give you control over how search engines crawl and index your website. Whether you need to block private pages, prevent duplicate content, or optimize crawl budget, understanding how these directives work will help you manage SEO visibility effectively.

What Is a Meta Robots Tag?

The meta robots tag is an HTML element placed in the <head> section of a page. It tells search engines whether and how they should index and follow links on that page.

Example:

This line instructs all search engines not to index the page and not to follow any links on it.

Common Values for the content Attribute

Here are the most frequently used values and what they mean:

Value

Description

index

Allow indexing of the page (default).

noindex

Prevent indexing — the page will not appear in search results.

follow

Allow search engines to follow links on the page (default).

nofollow

Prevent search engines from following any links on the page.

noarchive

Prevent displaying a cached version of the page in search results.

nosnippet

Disable text snippets and previews in search results.

noimageindex

Prevent images on the page from appearing in Google Images.

notranslate

Block translation links in search results.

max-snippet:-1

No limit on snippet length (you can also set a specific number of characters).

max-image-preview:large

Allow large image previews in search results.

max-video-preview:-1

Allow video previews without limitation.

You can combine multiple directives:

➡️ This means:
Do not index the page, but do follow the links and do not store a cached copy.

Targeting Specific Search Engines

You can specify which crawler should follow your rules. For example, only block Googlebot:

Or only apply rules for Bing:

If you use name="robots", it applies to all crawlers by default.

Example: Prevent Indexing of a Thank You Page

A classic case — preventing a post-form submission page (like a “Thank you” page) from appearing in search results.

✅ Google and other search engines will not index this page and will not follow its links.

The Role of robots.txt

The robots.txt file is a text file placed in your site’s root directory (e.g. https://example.com/robots.txt). It defines which areas of your website crawlers are allowed or disallowed to visit.

Basic Example:

Explanation:

  • User-agent: * → applies to all crawlers
  • Disallow: /admin/ → prevents crawlers from accessing /admin/ directory
  • Allow: / → allows crawling of the rest of the site

Blocking a Single Page

Now https://example.com/private-page.html will not be crawled.

Allowing Specific Crawlers

You can give different instructions for different bots:

Googlebot will not crawl /test/, while Bingbot can crawl everything.

Important Difference Between robots.txt and Meta Robots

Aspect

robots.txt

Meta Robots Tag

Location

Root directory (/robots.txt)

Inside the HTML <head>

Affects

Crawling (access to pages)

Indexing (appearance in results)

Use Case

Restrict crawling large sections or files

Control indexing of specific pages

Visibility

Public

Page-specific

Can prevent indexing?

❌ Not always (blocked URLs can still be indexed by URL reference)

✅ Yes, reliable for removing pages from search results

Best Practice:

If you need to prevent indexing, use the meta robots tag (noindex) — not just robots.txt. If you need to prevent crawling, use robots.txt.

Advanced Tip: Combining Both Approaches

You can combine them for stronger control.

This ensures:

  • Crawlers won’t visit /private/
  • Even if they do, the page won’t be indexed

Common Mistakes to Avoid

  1. Blocking via robots.txt instead of using noindex. ➜ Crawlers can’t read meta tags if they can’t access the page.
  2. Forgetting to remove “noindex” after launch. ➜ Common when staging or testing.
  3. Wrong placement of meta tag. ➜ Must be inside <head>, not <body>.
  4. Assuming all crawlers obey robots.txt. ➜ Some bots ignore it entirely (especially scrapers).

Best Practices Summary

  • Use robots.txt for large-scale crawl management.
  • Use meta robots tags for fine-grained indexing control.
  • Combine both only when necessary.
  • Regularly test your configuration using Google Search Console’s URL Inspection Tool.
  • Always verify that important pages are indexable and unimportant ones are hidden.

Related Topics (for future internal linking)

  • How to Optimize Your Crawl Budget in 2025
  • Canonical Tags Explained: Prevent Duplicate Content
  • How to Set Up a Custom 404 Page in HubSpot