Canonical URL Tag – Dealing with Duplicate Content

Shoes

Whenever we’re doing search engine optimisation work on client sites one of the biggest issues we face is duplicate content pages. Many content management systems create several versions of a page with the same content, as there are many instances when this is required.  This can however cause problems with the search engines as they are unsure which version of the page to list in their index.

The Problem

Consider for example a category page on an e-commerce site, in this case showing shoes. This might have a URL that looks something like this:

www.example.com/category/shoes

You are able to order the shoes category page by name, price or season, this however generates 3 copies of the page with the same content but slightly different URLs:

www.example.com/category/shoes?orderby=season
www.example.com/category/shoes?orderby=price
www.example.com/category/shoes?orderby=name

As discussed this now causes problems as the search engines see four pages when we only want them to list one. A secondary issue is that each of the pages will have some PageRank; ideally we want to concentrate all this PageRank on a single page to achieve a higher ranking.

The normal procedure with duplicate content pages is to redirect all the copies to the original page using a 301 (permanent) redirect. The problem here is that in the situation above this isn’t possible as the user needs to be able to view all versions of the page.

The Better Way – Canonical Tag

There is now (and has been since February – I’m just late to the party) a tag which will tell the search engines to ignore the duplicate pages and pass all ranking value down to the original page.

<link rel="canonical" href="http://www.example.com/category/shoes " />

The URL you need to use is the URL of the original page that you’d like to appear in the search engines.

Notes

There are a few points worth noting if you intend to follow this approach:

  • It is recommended that you use absolute URLs in the canonical tag to avoid the chance of errors.
  • The canonical tag will only work with pages that are similar, some changes are allowed such a different sort order for the content on the page. Too much change and the search engines will ignore the tag.
  • The canonical tag can only be used across pages in the same domain.
  • The tag is currently supported by Google, Yahoo! & Bing

Image by Eric Hart

Share:
  • Twitter
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Reddit

 

6 Responses to “Canonical URL Tag – Dealing with Duplicate Content”

  1. M Smith  on July 27th, 2009

    Would it not be better simply to have a form with a hidden ‘orderby’ field and POST the data so it doesn’t appear in the URL at all? That way you’ll always have the same link regardless of the way the content is displayed.

    Or have I missed something? :\

  2. Ash  on July 27th, 2009

    The issue with the form POST approach is that it breaks the back button on a number of browsers. It also forces you to use images or JavaScript as the sort by links in order to submit the form.

  3. DG  on July 28th, 2009

    I thought search engines wouldn’t index URLs with parameters in them – unless you override this in e.g. a sitemap.xml file?

  4. Ash  on July 28th, 2009

    As far as I’m aware search engines will index pages with parameters within the URLs but only upto a certain point when they realise duplicate content is being served under different URLs.

  5. Vijay simha reddy  on December 29th, 2009

    Really good stuff, I would like to know ans for some queries

    1. Can we do Canonical between two different domains

    2. If yes, what HTML code we need to use

    3. where that HTML code should be place

    4. what are uses by doing this

  6. Mark Petherbridge  on February 2nd, 2010

    I have not delved into SEO and I know that it’s something that I definitely should, and because of this I am probably about to make myself look stupid because I have no idea how crawlers and engines work. But In the case of PHP, when you use this method to display something simple like the current page you are viewing, for example:

    I know there are better ways to do this, as stated just an idea.

    How does the search engine then see this as two pages? I thought that it would crawl your site and take the information about your pages and then store them or whatever. But this isn’t an actual page until it’s in action, so unless the crawler just so happens to see the page when someone uses it??


Leave a Reply