Partnering to help solve duplicate content issues

One of the most common challenges search engines run into when indexing a website is identifying and consolidating duplicate pages. Duplicates can occur when any given webpage has multiple URLs that point to it. For example:

URL Description
http://mysite.com A webmaster may consider this their authoritative or canonical URL for their homepage.
http://www.mysite.com However, you can add ‘www’ to most websites and still get the same home page.
http://mysite.com/default.aspx You can also often add the specific filename of the homepage and get the same page
http://mysite.com/default.aspx?promo=ABC Many times websites use parameters to track things like where customers are coming from (in this case an offline promotion), or parameters that determine how the content on the page is formatted.

These four cases are just a few of the many possibilities. When you consider all the combinations of these, you could have more than 10 clone URLs for every page on your site. That means if there are 1 million pages on your site, we could possibly find 10 million or more cloned URLs pointing to them. Determining your canonical URL amongst all the duplicate clutter has been an onerous challenge for search engines as we all work to reduce cost and improve relevance.

To help solve this issue, Live Search has partnered with Google and Yahoo to support a new tag attribute that will help webmasters identify the single authoritative (or canonical) URL for a given page. The link tag defines a relationship between a document and an external resource. In this case, that resource is the canonical URL. The following is an example of the new link tag attribute for canonicalization:

<link rel="canonical" href="http://mysite.com"/>

A few notes about the implementation of the new attribute:

  • This tag will be interpreted as a hint by Live Search, not as a command. We’ll evaluate this in the context of all the other information we know about the website and try and make the best determination of the canonical URL. This will help us handle any potential implementation errors or abuse of this tag.
  • You can use relative or absolute URLs in the “href” attribute of the link tag.
  • The page and the URL in the “href” attribute must be on the same domain. For example, if the page is found on “http://mysite.com/default.aspx”, and the ”href” attribute in the link tag points to “http://mysite2.com”, the tag will be invalid and ignored.
    • However, the “href” attribute can point to a different subdomain. For example, if the page is found on “http://mysite.com/default.aspx” and the “href” attribute in the link tag points to “http://www.mysite.com”, the tag will be considered valid.
  • Live Search expects to implement support for this feature sometime in the near future.

While we expect this command will help us solve many of the more complex duplicate content issues, we still highly recommend that webmasters follow the existing best practices for normalizing their URLs through domain canonicalization and normalization of URL parameters. We’ll provide more details on the link tag after we’ve implemented full support in one of our upcoming releases. In the meantime, we look forward to hearing your feedback on the new tag.

– Nathan Buggia, Live Search Webmaster Team

Join the conversation

46 comments

Your email address will not be published. Required fields are marked *

  1. Anonymous

    I hope you’ll come out with guidelines for what’s right and wrong.  For instance, if I create a page that responds to URL parameters like this:

    /page.aspx?category=3

    but I normally rewrite it using an ISAPI filter or some similar technology to be referenced as:

    /page-category-3

    I should be able to put in a rel canonical that rewrites the dynamic URL to the static URL (but obviously changing based on the category parameter).  That’s clearly within the spirit of why you’d implement this, but I wouldn’t want a quality engineer somewhere to accuse people of spamming using rel canonical.

  2. Anonymous

    This can be a big boon to seo industry where canonicalization has always been an hot topic to discuss

  3. Anonymous

    Do you support HTTP Content-Location header as well?

    It’s 10 years old and has been designed for (almost) the same purpose…

  4. Anonymous

    I have one absolutely burning question about this tag:

    If I include it on a page which has a meta robots tag of "noindex", and point it to a canonical variant of this page (which can be indexed), does this cause any problems?

    Essentially, we use meta robots "noindex, follow" for things like pagination, different sorting order of products, etc etc – this handles the duplicate content issue (and much better than robots.txt, from a site-owner’s perspective).

    What I want to make sure is that, if I include this new rel=canonical tag, that search engines that don’t handle this new tag can handle the "noindex" tag to eliminate duplicate content that way and search engines which do use the canonical tag are correctly supported.

    This is the single most important thing I need to know about this new tag. Please could you include this in your webmaster guidelines or a follow up blog post?

    The second most important thing is – is the behaviour of the above standardised with the other search engines which are using it too?

  5. Anonymous

    This blog has not bee useful to me. I understand that we add something like the canonical thing but how do i add it? My goal is for my website to be searchable by LiveSearch and this blog i think hasn’t given me an answer or a way that can lead me to an answer. Please help.

  6. Anonymous

    Is it possible to move an entire site from php to asp using this link tag?

  7. Anonymous

    As simple as a Columbus egg but will it work as promised? Or will it be just another tag to stuff my <head>?

  8. Anonymous

    Guys, please help me to promote canonical tag for IPB forum. I want to motivate ipb creators for implementing this simple tag in their CMS, but at the moment they all show me resistance :(

    You help needed in replying in this thread:

    http://forums.invisionpower.com/index.php?showtopic=281532

    Thanks!

  9. Anonymous

    Live Search Webmaster Center Blog

    Official blog of the Live Search Webmaster Center Team.

    Nathan

    Are you still monitoring/responding to the above blog?

    If not, who at MS is?

    Jim

  10. Anonymous

    Nathan

    Why do the MSN bots ignore ‘crawl delay’ and ‘disallow’ directives?

    Many, many related (unanswered) questions on this blog.

    Comments please?  Thanks.

  11. Anonymous

    This sounds great, but has anyone done any testing, yet? I mean, are dup URLs actually being *removed* from indexes by this? How is this going to affect manipulative duplicate content alogos?

    Canonicalization issues should be addressed in planning and development and are very easily avoided when you’ve structured your website appropriately. Keyword research, develop, deploy. I just don’t trust this *one* tag (anyone remember metas?) to resolve the issues, entirely; it’s up to programmers to program accordingly. Dynamic 404s and strict URL structuring is an extremely effective, preemptive technique that people aren’t using as it is. What happens when this tag gets abused or deployed incorrectly?

    Will this tag actually have any effect on ‘big’ sites that *don’t* implement this technique?

    I need to understand the reward and penalty structure of this tag, in direct reference to white hat, and black hat, policies; and what Search Engines have in mind for this consideration.

    This will be interesting to watch unfold over the next several months…

    Arow

  12. Anonymous

    Nathan,

    I am glad to see Microsoft and others working with Google on the canonical lick element. Matt Cutt’s Blog has an excellent 20 min. video explaining the new element in detail. By the way great post.

  13. Anonymous

    Glad to see this new canonical feature as it will benefit both webmaster and search engine.

  14. Quality Directory

    Up till a couple of months back, I didn't know the issue of canonicalization meant duplicate content to search engines. But after reading this particular article at MSDN Webmaster Blog, I went to work and cleaned up my internal link structure.

  15. Anonymous

    good article and keep posting thank you

  16. Anonymous

    Thanks.. that's awesome..

  17. Anonymous

    Great post.. Awesome anyway..

  18. get backlinks

    is there any tool to increase decrease crawl rate of bing?

  19. Anonymous

    nice blog, thanks for sharing, hope bing can be the best search engine

  20. Anonymous

    <object width="425" height="344"><param name="movie" value="http://www.youtube.com/…/ToqekMYYPO4&hl=es&fs=1&&quot;></param><param”>http://www.youtube.com/…/param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/…/ToqekMYYPO4&hl=es&fs=1&&quot; type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object>

  21. Anonymous

    This can be a big boon to seo industry where canonicalization has always been an hot topic to discuss

  22. Anonymous

    great post bro.. Thanks for share

  23. muffieshannen

    In line with above article, I advise to please make the article neophyte-blogger-friendly and how-to-implement the said tips. I'm sure lots of starting bloggers are willing to implement the canonicalization just give us simple instruction how to implement it.

    Thanks..

  24. xpo_greenfintech

    well its quite healthy that you guys are working with Google and yahoo and all of you are implementing a same thing. Best of Luck. Can we start implement the link tag?

  25. Anonymous

    live search parthers with yahoo and google to solve the duplicate content proble, that really helps. good.

  26. Blackpool UK

    i need to move my site from php to aspx and wondered if doing the method mentioned would work?

  27. miles2go

    Its use full for the hardcore SEO's. But the blogger who uses 3rd party service like wordpress ,joomla doesn't face it. I use wordpress it manage very well.

  28. Anonymous

    Hi Nathan. You mention "in the near future". Can you please confirm whether or not it has been implemented yet in Bing. Best regards.

    Rich

  29. Anonymous

    We now offer 11 new weather scripts for webmasters to use on their site and get repeat visitors.

  30. Anonymous

    Thank you

  31. Anonymous

    Great article.

  32. Anonymous

    canonicalization issues should be addressed in planning and development… this will be interesting to watch unfold over the next several months!

  33. mountainh2o

    This remedy has some benefits, but still does not change the mysite.com/index.html to mysite.com.  With php, I suppose you can get it several ways.  With PHP, I can leave off any part of an address and I end up somewhere.  This will not change any of that.  

    It might  however change the non www to www.  In our practices, we use a 301 redirect in the htaccess file.  This has to be made by the server administrator but worth the set up.  With the proper 301 redirect, will direct all extra domain names as well as non www to our chosen www, and has prevented most all duplicate content.

    We did install this canonical phrase mentioned above by BING in our home page so if you would like to see how it is applied, look at our "source page", (visible with any browser in the view drop menu) for the website: "SEO Domain Names dot com".  we put these type of links like these in the Meta area, after meta keywords and before the end of head, or: /head > goes, the same place your CSS "cascading style sheets" go, also a link.

    Hope this helps.  Definitely helps to have an HTML editor to edit this unseen (head) part of your web page such as Adobe Dreamweaver, MS FrontPage or newer, or the like…  Good luck to you!

  34. Anonymous

    This blog has not bee useful to me. I understand that we add something like the canonical thing but how do i add it?

  35. Anonymous

    This is classic asp script will allow you to monitor your current local weather via a Weather.com XML feed.

    It parses the XML data and then outputs formatted HTML.

  36. Anonymous

    please do let me know whether we can get this sorted out as this tag will do some good to the canonical issues.

    Really if 3 giants are thinking on this then we can use them and can see the results.

  37. Anonymous

    Thanks  i was looking for this posts from lontime..

    what about the domains ?? is there any effect in ranking of site or not  

  38. truevisiontech01

    I think bing  doesn't have effective duplicate content resolving technology they need to increase their redirection registration speed and also need to give importance to canonical tags.

  39. m0rad

    Great article.

  40. novintabligh

    I think redirecting is better than using "canonical".

  41. bonnierseo

    Does Bing support cross-domain canonical tags?