SEOs and business owners have always had to be wary about duplicate content when trying to rank well in search results, but with the latest Panda update having been released a couple of months ago, a lot of us are focussing more than ever on it. In this article, we will try to uncover what Google determines to be duplicate content and what impact this might have on your website and your rankings.
First, let’s look at what Google defines as duplicate content. Google’s guidelines define duplicate content as “substantive blocks of content within or across domains that either completely match other content or are appreciably similar”. So basically, something can be considered duplicate content if it is partially, or completely the same across multiple pages on your own website, or on another website.
If a Competitor Takes Your Content
The most simple solution to this problem would be to ensure that the same information isn’t used twice, but this isn’t always within your control. In some instances your competitors, or scraper websites may take your content and use it elsewhere. Unfortunately, there isn’t anything that you can do to stop this, unless someone has taken your work and claimed it as their own, in which case you can take legal action, as they have plagiarised your work.
The good news is that Google is unlikely to punish you in this scenario. Google is often clever enough to know which is the original, by comparing the dates that the copy was first seen. It will then assume that the oldest version of the content is the original. That being said, if a page on your website isn’t ranking and it’s been duplicated elsewhere, it is a good idea to change it, or to report it (which you can do here). But if your rankings aren’t being affected, then there’s no need to panic.
Image source: Flickr – Daniel Foster
Duplicate Content in Boiler Plates
Internal Duplicate Content
So far we have focused exclusively on cross domain duplicate content, however internal duplicate content can also be a major issue and can be easy to miss as most of the causes are technical. A few of the ways that duplicate content can be caused are listed below:
- Your website can sometimes allow the same article to be displayed through multiple URLs e.g. /keyword-x/ and /article-category/keyword-x/. This is easy to miss because you may only have one instance of the article in the database, but when a search engine looks at the site, it will see these as two different articles.
- Your website contains printer friendly pages. Search engines will often see your printer friendly version of the page and your normal version as two separate pieces of content.
- www. and non www. versions of your website exist.
- Multiple versions of the same product, with just a minor change on an e-commerce website e.g. the same top, but in different colours.
These are just some of the most common ways that you can end up with internal duplicate content on your website. In pretty much every case of internal duplicate content, Google is clever enough to differentiate and understand that it isn’t being done maliciously and are therefore unlikely to penalise you for it. It is more likely to be the case that one of the pages rank and one of them doesn’t. In a recent Webmaster Hangout, John Mueller actually said that if you are selling the same product in a variety of sizes and colours, then Google understands that you are going to have a lot of mentions of the product.
Does Duplicate Content Matter?
So, does duplicate content matter? Well we have determined that in most cases Google understands the difference between malicious duplicate content and that which comes about unintentionally. It is also pretty good at determining which version of duplicated content is the original. It is however best practice to ensure that your duplicate content is minimised, as you don’t want to just rely on Google to get it right, as they aren’t perfect.
If you have a website and aren’t sure if you have duplicate content either internally, or across different websites then there are a couple of tools you can use to help. Siteliner is a free tool that will crawl your website and highlight pages with duplicate content issues, as well as providing you with useful information on other aspects of your website. Copyscape is also a useful tool for identifying duplicate content, across external sources. The free version only allows you to check one URL at a time, however if you have a larger website and want to complete the task quickly you can sign up for an account and buy credits.