Wikipedia:Duplication detector

{{notice|This tool is lacking an active maintainer, please see the abandoned tool policy if you're interested in helping out.}}

{{hatnote|Go directly to the Duplication Detector on the Toolforge web site}}

{{Information page}}

{{shortcut|WP:dupdet}}

The duplication detector is a tool used to compare any two web pages to identify text which has been copied from one to the other. It can compare two Wikipedia pages to one another, two versions of a Wikipedia page to one another, a Wikipedia page (current or old revision) to an external page, or two external pages to one another. Duplication detector locates passages in which the text on the two pages is the same. The number of words to match is variable, but set by default to 2.

Usage

The tool is frequently used in checking copyright issues on Wikipedia but can also be used in other ways, such as to help locate quotes in a biography of living persons taken from a large PDF to check for accuracy.

The tool is used by supplying URLs of two websites to compare (or, if using the advanced version, by uploading either document from your computer). It supports text, HTML, and PDF documents. For other types of documents, check Google's cache for an HTML version by doing a Google search for "cache:URL". To make the tool run faster for very large documents, increase minimum number of words to at least 3. For source documents containing scattered numerals, you may have to check "Remove numbers" to get the best matches. You have the option of removing quotations from matches.

Duplication detector can see article text hidden by templates like {{tl|copyvio}}, since the text is still in the HTML page source, but cannot see text that has been removed. You need to use the URL of an old revision in this case.

License

The PHP source for Duplication Detector is available under the Simplified BSD License.

See also