WordPress: Tell Search Engines Not to Index Archives

Google Search Inculuding Archives

If you run a WordPress [1], and you to search Google‘s index using the site parameter – namely, site:%yourdomain% – you’ll probably find that Google has also indexed all of the archives, categories, and tags of your site. For many, the first impulse is to delete this from Google somehow or maybe use robots.txt, or some other method. But, I’ve found the solution is easier, and I’ll show you how to do it (without a plugin).
As always, there are many ways to do this.  You’ll see many suggest using a plugin (such as the popular WordPress SEO by Yoast). But, my approach is that everything you do in WordPress should be done in code if you can. The key is to only use plugins for very unique purposes, or as a last resort until you can figure out how to do it otherwise. Some also suggest the aforementioned robots.txt, but I consider that messy.

The problem of seeing the mass of tagged and archived posts inside of Google’s search doesn’t necessarily seem huge, until you realize that many of these links inside of Google on create duplicates (that may count against your site). I’m not sure the weight Google will ascribe to these duplicates, but they probably don’t help. And, let’s face it, it’s an esthetic issue.

What I’ve done is added a line to a file in my theme that tells Google to avoid specific pages. But before I describe this: I want to warn you that changes to low-level files can render your site inoperable. Please use caution with this code. If you are using a theme you didn’t design from scratch, use a child theme to make these changes.

Copy the following lines of code and place them in your theme’s header.php:

<?php if(is_archive()){ ?>
<!--Tell Search Engines not to index archives -->
<meta name="robots" content="noindex, follow">
<?php } ?>

The header.php file is standard in your theme (or child theme) directory under /wp-content/themes/%themename%. You’ll need to place the code above the closing </head> tag.

The is_archive() function asks WordPress if the page it’s about to render is an archive, tag, or category page. If this is true, the next two lines are laced in your code. If not, nothing is placed in the code. The following is the output of the above code when true:

<!--Tell Search Engines not to index archives -->
<meta name="robots" content="noindex, follow">

Hopefully, this works for you and gets your site on track without the need for an unnecessary plugin. As with most things, this can be greatly improved or the maybe a better way to do this. If you have a suggestion, let me know in the comments and I’ll update my post with new information.

1. This happens on Blogger also, though that’s another rabbit hole