User:Bluemoose/DataBaseSearchTool

Image:WikiDataDumpSearch.png

I made this program for searching the [http://download.wikimedia.org/enwiki/ database dump] (The "Articles, templates, image descriptions, and primary meta-pages." dump, normally about 1100 MB).

It can find simple or regex (regular expression) text matches in articles. It can also find articles with more or less than a certain number of characters or links.

It is good for finding common mistakes, such as typos. It takes only a few minutes to process the entire database. The lists it produces are useful with the WP:AWB, a pywikibot or pasted into Wikipedia.

A typical run takes between 1 and 6 minutes using an Athlon XP 2500 CPU.

This software is now built into AWB. (requires MS Windows and the [http://www.microsoft.com/downloads/details.aspx?FamilyID=0856eacb-4362-4b0d-8edd-aab15c5e04f5&DisplayLang=en .NET framework 2])

If you use this software, please let me know what you think!

Suggested tasks

  • Search for common spelling/grammar errors.
  • Search for people categories that do not have a sort key e.g. search for Category:Economists.
  • ...

Latest version

  • 1.7.0.0, released 4 May 2006

Category:Wikipedia tools