Duplicate content has been discussed and fought over on almost all panels I’ve attended.
SEO’s seems to like the subject very much, I have no clue as to why.
We’ve discussed about duplicate content issues here.
Today Google, Yahoo and MSN have come up with a new “idea” to help webmasters fight duplicate content.
First off – How does duplicate content occur on a website ?
- When more than one page has the same content.
- When more than one page are similar in content.
- When a page is repeated on the website due to technical glitches.
- When dynamically generated pages repeat the same content over various events.
So in such events, the search engines, on seeing the same content on diff pages, “suspends” the value of those pages, and takes time before it shows up either one of those pages on the search results page for a live search.
Ex:- Lets say we have two pages on a website.
URL 1 – http://yoursite.com/yourpage
URL 2 – http://yoursite.com/yourpage?bgcolor=blue
Let’s assume that the second page is the same as the first page except for the background color which is dynamically controlled on CSS styles.
Now, when there are references to the two URLs from another or more website, with similar or same anchor texts, Google will find it difficult to decide which page to come up with.
In such situations, Google might take its own time to decide which page to show up on the search engine listings for a related search. It’s more like a confused state. (Bots aren’t always smart you see.)
So that explains why a website should contain minimum duplicate entries or duplicate content.
It might not be possible to completely avoid duplicate content on a website, but the idea is to curb it to the minimum causing the least confusion.
How to curb duplicate content ?
In the above example, there are more than one way of telling Google that one page is better than the other.
1 – Google has its own calculations that it does to analyze the content, and come to a decision as to which page makes more sense.
2 – Google can also check for external factors such as incoming links, anchor texts, contextual content on the links etc and decide as to which page among the two are more “popular” or “preferred”.
How do Search Engines deal with Duplicate content ?
Search Engines takes their own time until they get evidence of why a page is better than the other before they actually display them on the live search results. They would simply carry on with the other results in the queue and suspend the “possible duplicate content” from being displayed on the live results.
So what is a Canonical tag ? How does it help in dealing with duplicate content ?
A canonical tag is a simple piece of HTML code (<link>) that you insert into the <head> section of a duplicate page, letting the search engines know that they are on a duplicate page and they need to find the original content elsewhere, and guide them there.
So let’s pick an example.
Page 1 - http://www.google.com/duplicate-content.html (Original source content)
Page 2 – http://www.google.com/duplicate-content-800×600.html (Duplicate content)
Now, you add the canonical link tag to the duplicate page, Page 2.
<link rel=”canonical” href=”http://www.google.com/duplicate-content.html ” />
So what happens now ? As soon as Google bots land on the duplicate page (page with the canonical tag), it does not give weight age to the content on that page, rather follows the original URL in the canonical tag code.
Where/Which pages should you add a canonical tag?
Technically , any page that you think will loop the content from a different page.
For example – http://www.yoursite.com/page1.php?sessionid=12+author=ben should be canonically tagged to http://www.yoursite.com/page1.php.
How does Canonical Tag help WordPress blogs ?
In my opinion, canonical tags should not be automated on WordPress blogs. Because although there are several occurrences of possible duplicate content on WordPress, the canonical tags may not work there efficiently as they require some amount of manual checks.
For example, on WordPress blogs, tags and archives creates a possible duplicate content situation, but not either can be effectively controlled by canonical tags. In such situations, meta noindex tags are far more effective.
But in instances like series posts (101-tips-part1.html and 101-tips-part2.html) , where IF the content are strikingly similar, one may manually insert the canonical tags to good use.
Otherwise, I’d stay away from automation at least for now.