Table Of Contents
Introduction
If you have ever heard someone mention robots.txt and wondered what it is, you are not alone. Many WordPress website owners come across this term but are not quite sure what it does or why it matters. The good news is that once you understand the basics, managing your robots.txt file becomes a straightforward task.
In simple terms, a robots.txt file is a small text file that sits at the root of your website and communicates with search engine bots. It tells these bots which pages or sections of your website they are allowed to visit and which ones they should skip. Think of it as a set of instructions posted at the entrance of your website specifically for automated visitors like Googlebot or Bingbot.
In WordPress, there are two ways this file can exist. First, WordPress can generate a virtual robots.txt file automatically. Second, you can create a physical robots.txt file yourself and place it on your server. When you create your own physical file, it takes priority over the virtual one. This is called manually overwriting the robots.txt file.
This article will guide you step by step through everything you need to know: what robots.txt is, why it matters for SEO, how WordPress handles it by default, and exactly how you can manually overwrite it to suit your needs.
What Is a robots.txt File?
A robots.txt file is a plain text file that follows a protocol called the Robots Exclusion Protocol (REP). This protocol was introduced in 1994 and has since become a standard way for website owners to communicate with web crawlers and bots.
When a search engine bot visits your website, the very first thing it looks for is a robots.txt file at yourwebsite.com/robots.txt. If it finds one, it reads the instructions inside before doing anything else. If it does not find one, it simply crawls the entire website freely.
The Basic Structure of a robots.txt File
A robots.txt file is made up of groups of rules. Each group contains a User-agent line followed by one or more Allow or Disallow lines.
Here is a simple example:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Let us break this down:
- User-agent: * – The asterisk (*) means this rule applies to all bots. You can also target specific bots like Googlebot by name.
- Disallow: /wp-admin/ – This tells all bots not to crawl the /wp-admin/ folder.
- Allow: /wp-admin/admin-ajax.php – This creates an exception, allowing bots to access this specific file even within a disallowed folder.
It is important to understand that robots.txt is a polite instruction, not a security barrier. Well-behaved bots like Googlebot follow these rules faithfully. However, malicious bots may choose to ignore them entirely. Never use robots.txt as a way to hide sensitive information – use proper server-level security for that.
Why Does robots.txt Matter for SEO?
Search engines have a limited budget for crawling your website, often called the crawl budget. This means they cannot crawl every single page infinitely. By using robots.txt strategically, you can guide bots toward your most important content and away from pages that have no SEO value.
Pages You Might Want to Block
- Admin pages: Your WordPress dashboard and backend should not appear in search results.
- Duplicate content: Pages like tag archives, author archives, or certain plugin-generated pages may create duplicate content issues.
- Login pages: Your /wp-login.php page has no SEO value and is better kept out of search engine indexes.
- Private areas: Membership-only sections or staging environments.
- Unnecessary plugin folders: Some plugin files and folders do not need to be crawled and can waste your crawl budget.
The Sitemap Connection
Most SEO plugins also add a Sitemap entry to the robots.txt file. A sitemap is an XML file that lists all the important pages on your website. Including a sitemap link in robots.txt helps search engines find and index your content more efficiently.
Example of a sitemap entry in robots.txt:
Sitemap: https://yourwebsite.com/sitemap_index.xml
How WordPress Handles robots.txt by Default
WordPress does not create a physical robots.txt file automatically after installation. Instead, it generates a virtual robots.txt file on the fly. This means the file does not actually exist as a file on your server – WordPress simply creates the output whenever a bot requests the /robots.txt URL.
The Default WordPress robots.txt Content
If your WordPress site is set to be publicly visible (Reading Settings > Search Engine Visibility is unchecked), the default virtual robots.txt typically looks like this:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
This is quite minimal. It blocks bots from crawling the WordPress admin panel but allows them to access admin-ajax.php, which is needed for certain frontend functionality. Everything else on your website is open to crawlers.
When Search Engine Visibility Is Disabled
If you go to WordPress Settings > Reading and check the box that says Discourage search engines from indexing this site, WordPress will change the robots.txt output to include a Disallow: / rule, which blocks all bots from crawling anything on your site. Be very careful with this setting – if it is turned on for a live website, your pages will not be indexed by Google.
Virtual vs. Physical robots.txt: What Is the Difference?
Understanding the difference between a virtual and a physical robots.txt file is key before you proceed with manually overwriting it.
Virtual robots.txt
A virtual robots.txt is generated by WordPress dynamically. It does not exist as an actual file on your hosting server. WordPress intercepts any request for /robots.txt and serves the content from its internal logic. Plugins like Yoast SEO or Rank Math can hook into this process and modify what gets served – all without creating a real file.
Physical robots.txt
A physical robots.txt is an actual file you create and upload to the root directory of your server. Once this file exists, WordPress and the web server will serve it directly to any bot that requests it, bypassing the virtual version entirely. This gives you complete and direct control over the content of your robots.txt file.
Which One Takes Priority?
The physical file always takes priority over the virtual one. Once a physical robots.txt file exists in your root directory, WordPress stops serving its virtual version for that URL. This is why manually creating or uploading a physical file is considered overwriting the default WordPress behavior.
Prerequisites Before You Begin
Before you manually create or overwrite your robots.txt file, make sure you have the following in place:
- Access to your hosting account: You will need to either access your server via FTP/SFTP, use your hosting provider’s file manager, or use the command line.
- Know your root directory: The robots.txt file must be placed in the root directory of your website – the same folder that contains your wp-config.php file and the wp-admin, wp-content, and wp-includes folders.
- A text editor: Use a plain text editor like Notepad on Windows or TextEdit on Mac (in plain text mode). Avoid rich text editors like Microsoft Word, as they add hidden formatting that can corrupt the file.
- A backup of your current settings: If you have an SEO plugin managing your robots.txt virtually, note down the current settings or take a screenshot before making changes.
Method 1: Manually Creating a robots.txt File via FTP/SFTP
FTP (File Transfer Protocol) and SFTP (Secure File Transfer Protocol) allow you to connect to your server and manage your website files directly. This is the most common and reliable method for manually creating a robots.txt file.
Step 1: Install an FTP Client
Download and install an FTP client on your computer. FileZilla is the most popular free option and works on Windows, Mac, and Linux. You can download it from the official FileZilla website.
Step 2: Get Your FTP Credentials
Log in to your hosting account (like cPanel or Plesk) and find your FTP credentials. You will typically need your FTP hostname (usually your domain name or an IP address), your FTP username, your FTP password, and the port number (usually 21 for FTP or 22 for SFTP).
Step 3: Connect to Your Server
Open FileZilla and enter your credentials in the Quickconnect bar at the top. Click Quickconnect. After a moment, you should see your server files on the right side of the screen and your local computer files on the left side.
Step 4: Navigate to Your Root Directory
On the right side (server files), navigate to the root directory of your WordPress installation. This is typically inside a folder called public_html, www, or httpdocs – depending on your hosting provider. You will know you are in the right place when you see files like wp-config.php and folders like wp-content and wp-admin.
Step 5: Check if a robots.txt Already Exists
Look through the files in the root directory. If a robots.txt file already exists there, you can download it, edit it, and then re-upload it. If it does not exist, you will create a new one.
Step 6: Create Your robots.txt File Locally
On your local computer, open a plain text editor (Notepad, TextEdit, or VS Code). Write your robots.txt content. Here is a well-structured example for a typical WordPress blog:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /?s=
Disallow: /search/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yourwebsite.com/sitemap_index.xml
Save this file with the exact name robots.txt (all lowercase, with the .txt extension). Make sure your text editor does not save it as robots.txt.txt – this is a common mistake on Windows computers where file extensions are hidden.
Step 7: Upload the File to Your Server
In FileZilla, navigate to the location of your newly created robots.txt file on the left side (local computer). Drag and drop the file to the root directory on the right side (server). If a robots.txt file already existed and you edited it, FileZilla will ask if you want to overwrite it – click Yes.
Step 8: Verify the File Is Live
Open your web browser and go to yourwebsite.com/robots.txt. You should see the content of your new robots.txt file displayed. If it looks correct, you have successfully manually overwritten the WordPress robots.txt file.
Method 2: Using Your Hosting File Manager
If you do not want to install an FTP client, most hosting providers offer a built-in file manager inside your hosting control panel. This method is simpler and does not require any additional software.
Step 1: Log In to Your Hosting Control Panel
Log in to your hosting account. Most providers use cPanel, but some use Plesk or a custom panel. The steps are similar across all of them.
Step 2: Open the File Manager
Inside cPanel, look for the Files section and click on File Manager. This opens a browser-based interface that lets you browse, create, edit, upload, and delete files on your server.
Step 3: Navigate to the Root Directory
In the file manager, navigate to your website’s root directory, usually public_html. You should see your WordPress files including wp-config.php, wp-content, and wp-admin.
Step 4: Create or Edit the robots.txt File
If a robots.txt file already exists in the root directory, right-click on it and select Edit. This opens it in the built-in text editor. Make your changes and save.
If no robots.txt exists, look for a New File button or right-click in the directory and choose Create New File. Name the file robots.txt and confirm. Then right-click the new file and choose Edit to open the text editor. Type your rules, then save.
Method 3: Using an SEO Plugin to Edit robots.txt
If you prefer to stay inside the WordPress dashboard without touching server files directly, popular SEO plugins like Yoast SEO and Rank Math give you a way to edit the virtual robots.txt file. Keep in mind that this method edits the virtual version – it does not create a physical file. However, it is still a valid way to customize your robots.txt behavior.
Using Yoast SEO
- Install and activate Yoast SEO if you have not already done so.
- In your WordPress dashboard, go to SEO > Tools.
- Click on File editor.
- You will see the robots.txt file editor. If a physical robots.txt file already exists, Yoast will show that. If not, it will show you the virtual content.
- Add or modify rules in the editor box and click Save Changes to robots.txt.
Using Rank Math
- Install and activate Rank Math SEO.
- Go to Rank Math > General Settings in your WordPress dashboard.
- Click on the Edit robots.txt button.
- Modify the content as needed and save your changes.
Important note: If you later create a physical robots.txt file on your server and you are also using an SEO plugin editor, the physical file will override what the plugin does. To avoid confusion, stick to one method.
Writing Effective robots.txt Rules for WordPress
Now that you know how to create and upload the file, let us dig into what you should actually write in it. Here are some practical rules and what they do.
Block the WordPress Admin Area
The WordPress admin area should never be crawled by search engines. These pages have no public value and could expose structural information about your site.
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Block the Login Page
Disallow: /wp-login.php
Block Search Result Pages
WordPress search results pages are dynamically generated and can create thin or duplicate content issues. Blocking them is generally recommended.
Disallow: /?s=
Disallow: /search/
Block Trackback URLs
Disallow: /trackback/
Block wp-includes (Optional but Recommended)
The wp-includes folder contains WordPress core files. These do not need to be crawled by search engines.
Disallow: /wp-includes/
Targeting Specific Bots
You can write rules specifically for individual search engine bots. For example, if you want to block only Bingbot from certain pages while still allowing Googlebot:
User-agent: Bingbot
Disallow: /tag/
User-agent: Googlebot
Allow: /
This level of control can be very useful for managing how different bots crawl your website.
A Complete robots.txt Template for WordPress
Here is a comprehensive and well-commented robots.txt file template suitable for most WordPress websites. Replace yourwebsite.com with your actual domain name.
# Allow all legitimate search engines to crawl the site
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /?s=
Disallow: /search/
Disallow: /author/
Disallow: /tag/
Allow: /wp-admin/admin-ajax.php
# Block aggressive SEO scrapers
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
# Sitemap location
Sitemap: https://yourwebsite.com/sitemap_index.xml
Note on blocking tag and author archives: Some SEO experts recommend blocking these because they can create thin content or duplicate content problems. Others prefer to allow them. If your tag pages or author pages have unique, valuable content, you may want to keep them open to crawlers.
Common Mistakes to Avoid
When working with robots.txt, small errors can have significant consequences. Here are the most common mistakes and how to avoid them.
Mistake 1: Blocking the Entire Website
Using Disallow: / blocks every bot from crawling everything on your website. This will prevent your entire site from appearing in Google and other search engines. Always double-check your rules before saving, especially after a fresh WordPress install where the Search Engine Visibility setting might have been enabled.
Mistake 2: Treating robots.txt as a Security Tool
As mentioned earlier, robots.txt is not a security measure. If you have sensitive directories or files, do not rely on robots.txt to protect them. Use proper server-level authentication, .htaccess password protection, or WordPress user roles instead.
Mistake 3: Wrong File Name or Location
The file must be named exactly robots.txt (all lowercase) and placed in the root directory. A file placed in a subfolder or named Robot.txt or robots.TXT will not be recognized by bots.
Mistake 4: Accidentally Blocking Pages You Want Indexed
Broad Disallow rules can accidentally block important pages. For example, Disallow: /page/ would block all paginated pages on your website. Always test your rules before finalizing.
Mistake 5: Using robots.txt to Block Indexing
Blocking a URL in robots.txt does not remove it from Google’s index. If a page is already indexed, the Disallow rule will stop Google from recrawling it, but the URL can still appear in search results (just without a snippet). To remove pages from the index, use the noindex meta tag or Google Search Console’s URL removal tool instead.
How to Test Your robots.txt File
After creating or editing your robots.txt file, it is important to verify that it works as expected. There are several ways to do this.
Method 1: View It in Your Browser
Simply open your browser and go to yourwebsite.com/robots.txt. If you see the content of your file, it is accessible. If you get a 404 error, the file is either in the wrong location or named incorrectly.
Method 2: Google Search Console
Google Search Console has a built-in robots.txt tester. Log in to your Search Console account and navigate to Settings > robots.txt. You can test specific URLs to see whether they would be allowed or blocked by your current rules. Google also highlights any errors or warnings in the file.
Method 3: Online robots.txt Testers
Several free online tools allow you to test robots.txt files. Websites like Merkle’s robots.txt tester let you paste in your robots.txt content and then check specific URLs to see if they would be blocked or allowed. These tools are especially useful for checking complex rule combinations.
What to Do if You Have Conflicting robots.txt Files
A common problem occurs when both a physical robots.txt file and an SEO plugin are trying to control robots.txt behavior simultaneously. This can cause confusion about which rules are actually being applied.
The Golden Rule: One Source of Truth
You should have exactly one authoritative source for your robots.txt content. Either manage it through a physical file on the server, or manage it through an SEO plugin – not both at the same time.
How to Check Which Version Is Being Served
- Visit yourwebsite.com/robots.txt in your browser.
- Compare the output with what you have in your SEO plugin editor.
- Check your server’s root directory for a physical robots.txt file using FTP or File Manager.
- If a physical file exists and it does not match your plugin settings, delete the physical file if you want the plugin to be in charge, or update the physical file if you want direct control.
Keeping Your robots.txt File Updated
Your robots.txt file is not a set-it-and-forget-it document. As your website grows and changes, you should revisit it periodically to make sure the rules are still appropriate.
When to Update Your robots.txt
- When you install a new plugin that creates new URL structures
- When you add a new section or subdirectory to your website
- When you change your sitemap location
- When you notice crawl errors in Google Search Console
- When you migrate your website to a new domain or server
Conclusion
Manually overwriting the robots.txt file in WordPress gives you a high level of control over how search engines crawl and interact with your website. While WordPress provides a basic virtual robots.txt file by default, creating and managing a physical robots.txt file ensures your rules are always applied exactly as you intend – without depending on plugin behavior or WordPress logic.
To summarize, the process involves understanding what robots.txt does and why it matters, recognizing the difference between the virtual file WordPress generates and a physical file you create, using FTP, SFTP, or a hosting file manager to place the physical file in your root directory, writing well-structured rules that protect your admin area, avoid crawl waste, and guide bots to your important content, and testing and verifying the file using your browser or Google Search Console.
A well-crafted robots.txt file is a simple but powerful part of good SEO practice. It takes just a few minutes to set up correctly, but the benefits – better crawl efficiency, cleaner search engine indexing, and a tidier web presence – can last for the lifetime of your website.
Take the time to create a thoughtful robots.txt file today, and you will be setting a strong foundation for your website’s long-term SEO health.
About the Author
Jay Patel is the Founder of XSquareSEO, a full-service SEO agency with experience in on-page SEO, eCommerce SEO, link building, technical SEO, SaaS SEO, and local SEO. For more information, feel free to contact us.
Explore More Guides
Purge WordPress Cache
Remove WordPress Theme
Separate Header Body WP
Start WordPress Blog Guide
WP Keywords for Ranking
SQLMap WordPress Security
Install Apps on WordPress
WP to Static Site Plugin
Splunk WordPress Integration
WordPress Status Check
