Duplicate Content and the Canonicalisation Tag
16:25 on Mon, 11 Jan 2010 | SEO | 0 Comments
Duplicate content on sites is a problem which can be easily avoided yet still occurs on many sites on the web. One of the most commonly affected are database driven e-commerce sites with many products in the store. The problem occurs when a product is sometimes placed into more than one category, creating different webpages with duplicate content.
Google doesn’t like webpages with very similar content. When the search engine spider crawls two webpages with very similar content, the search engine will display only the one webpage it deems the most relevant.
Google: Not happy at the duplicates
Since creating content takes time and a little bit of imagination, some unscrupulous webmasters will try to manipulate their search engine rankings by deliberately duplicating content across multiple webpages or domains. Sites that host 100% duplicate content like this usually get penalised in some way with a hit taken to the site’s ranking in the SERPs. It is important to note however that sites with bits of duplicate content are not put under any kind of penality, the duplicate pages are simply filtered out of search results.
Google seeks to deliver the most relevant and useful information for users and there is little point in them showing 5 pages in the top 10 which contain the same information, if the first page didn't have the information your looking for, why burn more SERP space?
Another possible problem is that if people start deep-linking to your webpages you run the risk of splitting incoming links between the two pages, which.
What you can do
There are a few things you can do to combat this:
The best way is simply to change the site structure so that you remove all the multiple URLs and only place the product under one category. An example is if you had two webpages about selling red felt tip pens:
In this example, the red felt tip pen page is categorised under “pens” and also under “products>colours>red”. What you could do here is remove the categories in the second URL however this isn’t always feasible and doing so may impact usability.
Another way is to choose the most relevant page and then 301 redirect all the other duplicate pages to that page. This way, users who visit the other pages will be presented with the 301 destination page. Also, if there are links coming into the other pages then some link equity will be passed over. Although, if there are a lot of links you are still better off getting the site owners to link to the new URL.
Last year, Google released a new meta tag called the Canonicalization tag. So coming back to the example of a site selling red felt tip pens but the same page has multiple URLs:
In this example, the shortest URL and also the easiest to remember and type in is the first one. If we want to tell Google that this page is the one we want displayed in the SERPs we would write the canonicalization tag as:
<link rel="canonical" href="http://www.example.com/pens/red-felt-tip-pen.html">
This little snippet of code goes in the <head> section of the webpages with the duplicate content and basically tells the search engine that the specified page is the page which should be shown in the search engines. This removes the risk of being penalised or having multiple search results showing what is essentially the same page under different URLs.
These are the main things you can do to get around issues with duplicate content. The Google Webmaster tools help section covers them in more detail.