User:Bluemoose/DataBaseSearchTool
I made this program for searching the [http://download.wikimedia.org/enwiki/ database dump] (The "Articles, templates, image descriptions, and primary meta-pages." dump, normally about 1100 MB).
It can find simple or regex (regular expression) text matches in articles. It can also find articles with more or less than a certain number of characters or links.
It is good for finding common mistakes, such as typos. It takes only a few minutes to process the entire database. The lists it produces are useful with the WP:AWB, a pywikibot or pasted into Wikipedia.
A typical run takes between 1 and 6 minutes using an Athlon XP 2500 CPU.
This software is now built into AWB. (requires MS Windows and the [http://www.microsoft.com/downloads/details.aspx?FamilyID=0856eacb-4362-4b0d-8edd-aab15c5e04f5&DisplayLang=en .NET framework 2])
If you use this software, please let me know what you think!
Suggested tasks
- Search for common spelling/grammar errors.
- Search for people categories that do not have a sort key e.g. search for
Category:Economists . - ...
Latest version
- 1.7.0.0, released 4 May 2006