Blog Post

Duplicate Content Detection


Duplicate Content Detection: Duplicate content detection is any method employed by a search engine or web developer to check if the material on a website is identical or highly similar to content on any other page or site.

Duplicate content may be overtly plagiarized, but search engines dislike duplicated content (and may apply duplicate content penalties) to any content that is overly similar.

Because the appearance of duplicate content is often created by technical misunderstandings—particularly the structure of URLs—and not a human effort to use pre-existing content, there are a number of tools to help web developers identify and eliminate duplicate content on their own sites. Google provides this ability through Google Webmaster Tools, and a number of third-party apps and services assist website owners in seeking out others who may have purposefully plagiarized their content.

Google has given some definitive answers on the levels of duplicate content that will negatively impact search rankings. According to Google’s Matt Cutts, the goal of the search engine is always to prioritize the original source of the content without penalizing other websites for, as an example, quoting a paragraph and properly attributing it to the originator. Likewise, he says, reasonable levels of duplicate content that arise due to a small site's URL structure will not lead to penalties. Google does, however, actively demote sites that exclusively republish duplicate content.