Canonical URL Tag – Dealing with Duplicate Content

Whenever we’re doing search engine optimisation work on client sites one of the biggest issues we face is duplicate content pages. Many content management systems create several versions of a page with the same content, as there are many instances when this is required. This can however cause problems with the search engines as they are unsure which version of the page to list in their index.
The Problem
Consider for example a category page on an e-commerce site, in this case showing shoes. This might have a URL that looks something like this:
www.example.com/category/shoes
You are able to order the shoes category page by name, price or season, this however generates 3 copies of the page with the same content but slightly different URLs:
www.example.com/category/shoes?orderby=season
www.example.com/category/shoes?orderby=price
www.example.com/category/shoes?orderby=name
As discussed this now causes problems as the search engines see four pages when we only want them to list one. A secondary issue is that each of the pages will have some PageRank; ideally we want to concentrate all this PageRank on a single page to achieve a higher ranking.
The normal procedure with duplicate content pages is to redirect all the copies to the original page using a 301 (permanent) redirect. The problem here is that in the situation above this isn’t possible as the user needs to be able to view all versions of the page.
The Better Way – Canonical Tag
There is now (and has been since February – I’m just late to the party) a tag which will tell the search engines to ignore the duplicate pages and pass all ranking value down to the original page.
<link rel="canonical" href="http://www.example.com/category/shoes " />
The URL you need to use is the URL of the original page that you’d like to appear in the search engines.
Notes
There are a few points worth noting if you intend to follow this approach:
- It is recommended that you use absolute URLs in the canonical tag to avoid the chance of errors.
- The canonical tag will only work with pages that are similar, some changes are allowed such a different sort order for the content on the page. Too much change and the search engines will ignore the tag.
- The canonical tag can only be used across pages in the same domain.
- The tag is currently supported by Google, Yahoo! & Bing
Leave a Reply








M Smith on July 27th, 2009
Would it not be better simply to have a form with a hidden ‘orderby’ field and POST the data so it doesn’t appear in the URL at all? That way you’ll always have the same link regardless of the way the content is displayed.
Or have I missed something? :\
Ash on July 27th, 2009
The issue with the form POST approach is that it breaks the back button on a number of browsers. It also forces you to use images or JavaScript as the sort by links in order to submit the form.
DG on July 28th, 2009
I thought search engines wouldn’t index URLs with parameters in them – unless you override this in e.g. a sitemap.xml file?
Ash on July 28th, 2009
As far as I’m aware search engines will index pages with parameters within the URLs but only upto a certain point when they realise duplicate content is being served under different URLs.
Vijay simha reddy on December 29th, 2009
Really good stuff, I would like to know ans for some queries
1. Can we do Canonical between two different domains
2. If yes, what HTML code we need to use
3. where that HTML code should be place
4. what are uses by doing this
Mark Petherbridge on February 2nd, 2010
I have not delved into SEO and I know that it’s something that I definitely should, and because of this I am probably about to make myself look stupid because I have no idea how crawlers and engines work. But In the case of PHP, when you use this method to display something simple like the current page you are viewing, for example:
I know there are better ways to do this, as stated just an idea.
How does the search engine then see this as two pages? I thought that it would crawl your site and take the information about your pages and then store them or whatever. But this isn’t an actual page until it’s in action, so unless the crawler just so happens to see the page when someone uses it??