How to stop Google from indexing unnecessary WordPress URLs
Sometimes, you don’t want Google to advertise one of your pages for the world to see. Learn how to stop Google’s crawler from indexing your WordPress site’s pages using Yoast SEO.
Google indexing unnecessary URLs can really bring down your site’s overall SEO score. It’s also bad for business since it can look unprofessional depending on the contents of the page. As a result, you need to find a way to stop Google from indexing unnecessary WordPress URLs.
All search engines organize search results by “indexing” them. This is good in most cases; it gives your WordPress site exposure. But in others, they may index URLs you want to hide from search results. One of the most common reasons this happens is out-of-the-box WordPress installations that haven’t been customized for SEO.
Default WordPress installations include a significant amount of empty templates. Despite this, they are crawlable and indexable to Google results by default. Having a ton of empty indexed pages is not great for business or your overall SEO ratings.
In this article, we’ll explore what Google can crawl and index behind the scenes and learn how to avoid Google from indexing unwanted pages using Yoast SEO. This guide is valid for out-of-the-box installations and already-established websites alike.
How Google crawls and indexes your WordPress pages
“Crawling” and “indexing” are often used interchangeably. In reality, they are different but related terms.
Crawling is following the links within a page, and then following the links in those linked pages until there are no more links to follow. A “spider” is a software program designed to do this. Google’s spider is called Googlebot.
Indexing, on the other hand, is storing and organizing the information found on pages, whether they’ve been crawled or not. The indexed pages appear on Google’s search result pages.
You can prevent a page from being crawled by various means, but that doesn’t immediately stop Googlebot from capturing the link and adding it to page results.
How to know what Google is indexing from your WordPress site
It’s not always immediately clear which specific pages from your WordPress site Googlebot is indexing. You may need to check which pages are being indexed manually.
One of the simplest ways to check all the URLs Googlebot is indexing from your site is by going to Google.com and typing this on the search bar:
site:your-domain
E.g.: site:wcanvas.com
The site:
keyword restricts search results to pages from that domain. It’s important to not leave any blank spaces in the text between the :
and the .com
.
If you run this test and the results contain links that you don’t want Googlebot to index, you need to block Googlebot from indexing those pages to improve SEO for your entire site.
Another option is to visit your site’s sitemap by going to your-domain.com/sitemap.xml
and start manually following links.
Sometimes you’ll find pages for authors, tags, and other content lacking proper web design. You don’t want these pages appearing on search results.
Now let’s dive into how to prevent them from appearing.
How to stop Google from indexing your WordPress pages using Yoast SEO
Now that you’ve identified the pages you don’t want Googlebot to index, let’s use the Yoast SEO plugin to hide these pages from search engines. Follow these steps:
Log in to your WordPress website to access your dashboard.
Access the sidebar and navigate to the post or page you want to exclude from Google results.
Once on the post or page, expand the ‘Advanced’ section. Look for the “Allow search engines to show this Post in search results?” option and change it to “No.” It will block Googlebot and other web crawlers from indexing the page.
Publish or update the post to confirm the change.
To verify these changes are in effect, all you need to do is check your sitemap. The page shouldn’t be there anymore.
Be aware that pages may take time to unindex, so if you still see it in your sitemap, it doesn’t necessarily mean the changes didn’t work.
Alternatively, if you want to exclude multiple pages in bulk, you could consider using a filter. Explore this Yoast SEO documentation to learn more about bulk filtering.
How to Disable Your Entire Site From Google Indexing?
There are 2 main methods for disabling Google indexing on your entire WordPress site: using the built-in feature in Settings > Reading and editing the robots.txt file manually. Let’s explore both.
Method #1: Configure the Reading Settings
The most straightforward way to prevent Google from indexing your site is to go to Settings > Reading. Once there, check the box that reads Discourage search engines from indexing this site.
After checking the box, remember to click the Save Changes button.
Method #2: Edit the robots.txt File
robots.txt
is a text file available in your website’s folders. You use it to issue commands that tell search engine crawlers (such as Google’s Googlebot) which of your site’s resources they can access.
The robots.txt
file allows you to specify which directories, subdirectories, URLs, or files you don’t want the search engines to crawl. Additionally, you can use it to prevent Google from indexing your entire site.
There are multiple ways to edit robots.txt
, but we believe the simplest one is using Yoast SEO. Go to Yoast SEO > Tools and click on File Editor.
Once in the File Editor feature, you should see a textbox with the contents of the robots.txt
file.
If you want all search engines to avoid indexing your site, you should include the following commands:
User-agent: *
Disallow: /
This code tells all search engine crawlers (including Google’s) to avoid indexing your site. After you edit the file, it should look something like this:
What if You Don’t Have Yoast SEO?
If you don’t have Yoast SEO, the alternative is to use cPanel or FTP.
You can either connect to your web server using your FTP credentials (your hosting account should provide them) or via cPanel. To log into your cPanel account, do it from your hosting account’s dashboard or go to your-domain-name.com/cpanel
.
Regardless of the tool you use, navigate to your server’s public_html
folder (sometimes named simply public).
Once in the public_html
folder, look for the robots.txt
file. Right-click on it and select View/Edit to edit the file.
You need to add the following commands to the robots.txt
file to disable Google indexing on your WordPress site.
User-agent: *
Disallow: /
Why would you want to stop Google from indexing some of your WordPress pages?
As we’ve explored in this post, you may want to unindex some of your WordPress site’s pages from Google, even if temporarily. There are many reasons for this. These are some of the most common:
Your website is unfinished
When you’re still testing the website, you don’t want anyone but your team to have access to it. Using WordPress staging environments will keep the progress private.
It’s a restricted page
Restricted pages like invite-only or gated download pages for ebooks aimed at specific audiences should not appear on search results.
Duplicate test sites
Duplicate sites for trials and testing for the production site should stay off search results.
Content duplicates
If you have the same content offered to visitors in different forms, make sure not all of them are indexed, as Google penalizes your overall SEO ranking if you have duplicate content.
Content you’ll update later
If one of your posts is outdated but you plan to update it in the future, it may be better to unindex it until you get around to it.
Final thoughts
If you have a WordPress site and want to improve its SEO, you should take a few hours to dive deep into the pages Google is crawling and indexing to filter out the ones that hurt your overall SEO.
This action should be part of an overarching strategy to boost your WordPress site’s SEO, not the only one.
We know it’s not great for business to have broken, gated, or empty pages advertised for the world to see, and you should take the time to unindex them. But on its own, it won’t make you a superstar in Google’s eyes unless it’s accompanied by other strategies.
Take this as one of the first steps of a long-term plan to power up your WordPress site’s SEO.
If you found this post useful, read our blog and resources for more insights and guides!
Related Articles
WordPress 101 / 8 min read
WordPress 101 / 8 min read
How to audit a WordPress website’s security?
WordPress, being the most popular content management system, attracts a fair amount of malicious attention. To prevent yours from falling victim to a cyberattack, you should periodically audit your WordPress…
Read MoreWordPress Security / 10 min read
WordPress Security / 10 min read
Why Is Your WordPress Site “Not Secure”? What You Need to Know
In July 2018, Google Chrome started flagging sites without an SSL certificate as “not secure.” Any website still using HTTP instead of HTTPS to exchange information with users has been…
Read MoreHow to... / 7 min read
How to... / 7 min read
WordPress CSRF attacks: what they are and how to prevent them?
WordPress CSRF (cross-site request forgery) attacks are one of the most common security vulnerabilities plugin, theme, and website developers have to account for. A dedicated hacker can take over admin…
Read MoreHow to... / 8 min read
How to... / 8 min read
WordPress SEO spam: what is it and how to prevent it?
WordPress SEO spam is one of the most common security threats for websites using this CMS. It can get your website labeled deceptive, cause your users to suffer scams, and…
Read MoreWordPress Security / 7 min read
WordPress Security / 7 min read
WordPress supply chain attacks: what are they and how to prevent them?
WordPress is the most popular content management system, attracting many hackers wanting to exploit such a rich ecosystem for their benefit. WordPress supply chain attacks are one of the methods…
Read More