User:SDZeroBot
{{Bot|SD0001|status=approved|codebase=Node.js|overridebrfa=Special:PrefixIndex/Wikipedia:Bots/Requests for approval/SDZeroBot}}
{{Toolforge bot|account=sdzerobot}}
{{2020 Coolest tool award|SDZeroBot|Newcomer|float=right}}
{{tinc|link=Wikipedia:List of cabals#Bot Cabal}}
__NOTOC__
SDZeroBot runs on Node.js and uses the [https://github.com/siddharthvp/mwn mwn] bot framework, also developed by SD0001. Most tasks are written in JavaScript, while the newer ones are in TypeScript. The source code is available on [https://github.com/siddharthvp/SDZeroBot GitHub].
Category:Wikipedia bots with JavaScript source code published
{{ombox
| type = content
| style = border:1px solid #B22222; display: table-cell;
| image = File:Shutdown button.svg
| text = Please use User:SDZeroBot/Shutoff for shutting off specific bot tasks. If the bot appears to be malfunctioning across multiple tasks, it should be blocked.
}}
Tasks
=Reports=
{{Clear}}
class="wikitable sortable" | ||||
Report | Description | Frequency | Last update | Logs |
---|---|---|---|---|
WP:User scripts/Most imported scripts | List of user scripts by number of users and active users | Every 2 weeks | {{#invoke:User:SDZeroBot|lastupdate|{{#section:WP:User scripts/Most imported scripts|lastupdate}}|1296000}} | {{#invoke:User:SDZeroBot|logs|job-mostimported}} |
WP:AfC sorting +subpages | Classification of pending AfC submissions by topics predicted by ORES | Every 8 hours | {{#invoke:User:SDZeroBot|lastupdate|{{#section:WP:AfC sorting|lastupdate}}|28800}} | {{#invoke:User:SDZeroBot|logs|job-afc}} |
User:SDZeroBot/NPP sorting +subpages | Classification of unreviewed articles by ORES topics | Every 12 hours | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/NPP sorting|lastupdate}}|43200}} | {{#invoke:User:SDZeroBot|logs|job-npp}} |
User:SDZeroBot/PROD sorting | Classification of articles proposed for PROD deletion by ORES topics | Every 4 hours | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/PROD sorting|lastupdate}}|14400}} | {{#invoke:User:SDZeroBot|logs|job-prod}} |
User:SDZeroBot/AfD sorting | Classification of articles nominated for deletion at AfD by ORES topics | Every 4 hours | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/AfD sorting|lastupdate}}|14400}} | {{#invoke:User:SDZeroBot|logs|job-afd}} |
User:SDZeroBot/Draftify Watch | Tracks articles being moved to draftspace | Weekly | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/Draftify Watch|lastupdate}}|604800}} | {{#invoke:User:SDZeroBot|logs|job-draft}} |
User:SDZeroBot/PROD Watch | Tracks the status of articles proposed for deletion per WP:PROD | Weekly | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/PROD Watch|lastupdate}}|604800}} | {{#invoke:User:SDZeroBot|logs|job-pwatch}} |
User:SDZeroBot/Redirectify Watch | Tracks conversions of articles to redirects | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/Redirectify Watch|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-rwatch}} |
User:SDZeroBot/G13 Watch | Records excerpts of drafts that have been deleted per G13 | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/G13 Watch|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-g13watch}} |
User:SDZeroBot/Recent AfC declines | Lists recently declined AFC drafts with excerpts and other data | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/Recent AfC declines|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-declined}} |
User:SDZeroBot/G13 soon | Lists drafts that would be-G13-eligible in a week | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/G13 soon|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-g131week}} |
User:SDZeroBot/G13 soon sorting | Classifies soon-to-be-G13-eligible drafts by ORES topics | Weekly | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/G13 soon sorting|lastupdate}}|604800}} | {{#invoke:User:SDZeroBot|logs|job-g13-soon}} |
User:SDZeroBot/G13 eligible | Lists G13-eligible drafts with descriptions and excerpts | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/G13 eligible|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-g13-elig}} |
User:SDZeroBot/GAN sorting | Classifies articles awaiting GA review using ORES topics | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/GAN sorting|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-gan}} |
User:SDZeroBot/Peer reviews | Annotated listing of articles for which peer review is requested | Weekly | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/Peer reviews|lastupdate}}|604800}} | {{#invoke:User:SDZeroBot|logs|job-peer}} |
User:SDZeroBot/Pending AfC submissions | Lists pending AfC submissions with excerpts and other data | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/Pending AfC submissions|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-pafc}} |
User:SDZeroBot/Unreferenced BLPs | Annotated listing of unreferenced BLPs, for Women in Red | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{#section:User:SDZeroBot/Unreferenced BLPs/Women|lastupdate}}|86400}} | {{#invoke:User:SDZeroBot|logs|job-unref}} |
WP:List of Wikipedians by good article nominations | List of users by most GAs | Daily | {{#invoke:User:SDZeroBot|lastupdate|{{REVISIONTIMESTAMP:WP:List of Wikipedians by good article nominations}}|86400}} | {{#invoke:User:SDZeroBot|logs|gans-list}} |
User:SDZeroBot/DYK nomination counts.json | List of users by most DYK nominations | Continuous | {{#invoke:User:SDZeroBot|lastupdate|{{REVISIONTIMESTAMP:User:SDZeroBot/DYK nomination counts.json}}|86400}} | {{#invoke:User:SDZeroBot|logs|SDZeroBot/eventstream-router/dyk-counts}} |
=Other continuous tasks=
class="wikitable" style="float: right; font-size: smaller;"
|+Internal tracking | |
scope="col" style="width: 100px;" | Job | Logs |
---|---|
stream | {{#invoke:User:SDZeroBot|logs|stream}} |
routerlog | {{#invoke:User:SDZeroBot|logs|SDZeroBot/eventstream-router/routerlog}} |
gans | {{#invoke:User:SDZeroBot|logs|SDZeroBot/eventstream-router/gans}} |
db-tabulator | {{#invoke:User:SDZeroBot|logs|SDZeroBot/eventstream-router/db-tabulator-metadata}} |
class="wikitable" | |||
scope="col" style="width: 100px;" | BRFA | Description | Frequency | Logs |
---|---|---|---|
BRFA | AfD notifier Notify users of AfD nominations of articles to which they've significantly contributed | Daily | {{#invoke:User:SDZeroBot|logs|job-notifier}} |
BRFA | Bot activity monitor: Keeps track of activity of fully automatic bots and reports the ones that are not working. Optionally also notifies the respective operators. | Continuous | {{#invoke:User:SDZeroBot|logs|bot-monitor}} |
BRFA | {{t|Database report}}: Updates tables with result of specified SQL queries. | Continuous | {{#invoke:User:SDZeroBot|logs|db-tabulator}} |
BRFA | Purges pages linked from User:SDZeroBot/Purge list | Continuous | {{#invoke:User:SDZeroBot|logs|SDZeroBot/eventstream-router/purger}} |
BRFA | Raise edit requests to keep gadgets in sync with upstream sources per User:SDZeroBot/Gadgets-sync-config.json. | Continuous | {{#invoke:User:SDZeroBot|logs|gadgets-sync}} |
— | Track sizes of categories listed on User:SDZeroBot/Category counter. | Continuous | {{#invoke:User:SDZeroBot|logs|cat-count}} |
=One-time / on-demand=
class="wikitable" | ||
scope="col" style="width: 100px;" | BRFA | Description | Frequency |
---|---|---|
BRFA | Consolidate stub tags on page where possible (replace X-stub and Y-stub with X-Y-stub or Y-X-stub if either exists) | One-time |
BRFA | Re-sort geographical stub articles with more specific stub tags | On demand |
— | Created the lists at User:SDZeroBot/Category cycles identifying cycles in the category tree | One-time |
BRFA | Adding {{t|Drafts moved from mainspace}} to drafts that were moved from mainspace | One-time |
BRFA | Adding {{t|Set category}} to set categories. | On-demand |
Tasks which edit only in the userspace don't require a BRFA.
How do you generate article excerpts?
Good question. Excerpts of articles used on many of SDZeroBot's classification pages are generated using a combination of regex and some slightly more formal parsing methods. The Node.js source code used can be seen [https://github.com/siddharthvp/SDZeroBot/blob/master/TextExtractor.ts here], which also relies on [https://github.com/siddharthvp/mwn/blob/master/src/wikitext.ts mwn's wikitext class]. This excerpt generator is also available as a webservice hosted on Toolforge at https://summary-generator.toolforge.org/ with a horrible bare minimum UI, but a better API endpoint. See the [https://github.com/siddharthvp/summary-generator GitHub README] for usage instructions.
I initially considered using the code from popups, but it was all too messy and integrated with a lot of other popups code that I couldn't get it to work standalone.
All excerpts are short enough, so that attribution and copyright concerns are avoided.
Source code
All source code that drives SDZeroBot is publicly available via the [https://github.com/siddharthvp/SDZeroBot GitHub repository], as well as on the /data/project/sdzerobot directory on Toolforge. Even the logs (*.out and *.err files) are publicly visible, which is by default not the case on Toolforge. The jobs.yml file used to schedule the tasks can also be viewed there.
== To do==
{{see also|User:SD0001/Scripts#To-do}}
If you're interested in helping out with these tasks, please contact me.
- Split certain sortlists into subtopics.
- For the sports list (1500+ pages), use machine learning to section this list by sport.
- Split the biography list by professions - this can be done just by looking at the other topics the bios have been classified with.
- For the STEM lists, make sections for core articles, STEM biographies, STEM media, STEM companies, ... (discussion)
- Automatically produce short descriptions for articles and drafts.
- Drafts mostly don't have short descriptions at all. They'd be very useful on AfC sorting. Also useful for the AfD, NPP, and PROD lists.
- Explore use of machine learning for this, failing which other methods such as Trialpears' bio shortdesc generator.
- If that also doesn't give desired level of accuracy (esp. for non-bio articles), don't actually add the shortdescs to the article, but show it on the sortlists.
- Consider creating a Toolforge-hosted web UI for AfD sorting list, so that more columns can be added (whose visibility can be toggled using javascript), based on ideas in here.
- Automate delsort tagging of AfDs using ORES. Works only for select delsort lists which have a corresponding ORES topic, or
- Automate delsort taggings using some custom machine learning model. The model can be trained on basis of delsort tagging done so far by humans. Difficulty here is that unbiased training of the model requires access to content of deleted articles as well. Simply training it on articles that were kept at AfD would not give good results.
- Big one: identify promising AFC drafts using ML.
- Probably using TemplateStyles, improve the appearance of the tables on very small and very wide screens.
- Create unified lists for PROD and AFD which include both deletion rationale and lead text. {{aye}} PROD
- Don't duplicate nomination text on AFD sorting report for multi-article nominations.
- User:SDZeroBot/Declined AFCs, G13 soon figure out ways to better identify bad and good drafts?
- integrate unreliable source detection using user:headbomb/unreliable.js
- Create articlesearch tool – shows excerpts of articles from CirrusSearch queries – use ReactJS
Tips and tricks for bot operators
=Monitoring failures=
For each SDZeroBot task, most of the code is in an async function [https://github.com/siddharthvp/SDZeroBot/blob/e0f2606586d9005d8dc956f8dc9598c6c455dffc/reports/g13-watch/g13-watch.ts#L314 with a catch] that traps errors and formats it as an email sent to the tool account, which lands in my inbox. For good measure, there's also a [https://github.com/siddharthvp/SDZeroBot/blob/master/botbase.ts#L37 process-level uncaughtException handler].
The only kinds of errors the above wouldn't handle are the ones that occur even before the javascript code starts executing (such as the file accidentally losing its executable permission) or OOMs, which are both handled by using --emails onfailure
while using Toolforge Jobs framework.
In addition, for the report pages, this user page lists them above along with their last updated timestamps. Along with the expected frequency of the updates, it is fed into a Lua module which prints the timestamp in bold red if it's delayed.
There's also WP:BAM which although maintained by SDZeroBot, is not used for monitoring itself.
A good combination of failure monitoring techniques is essential for operating bots that reliably perform a number of tasks without requiring you to spend time and energy on making sure everything is running.
=Handling blacklisted links=
If SDZeroBot is unable to save an edit because it is introducing a spam-blacklisted link (which of course isn't the bot's fault since it likely just picked up the text to be added from another place), it identifies the problematic link from the API response, and removes the protocol ("http:" or "https:") from the link, and then attempts to save the page again. This does mean that a link that was supposed to look like [https://google.com Link label] ends up looking like
=Use OAuth=
Always use OAuth instead of BotPasswords. There are all these advantages:
- Faster: BotPasswords requires at least 3 API calls just to get the bot off the block: one to fetch a login token, another to actually log in, and usually another one to fetch editing tokens. Since OAuth doesn't require any API call to begin the authentication, you just need one API call – to fetch the tokens.
- Lesser errors: Session loss often occurs using cookie-based authentication methods. Good bot frameworks should handle these automatically by logging in again on getting the assertbotfailed or assertuserfailed API response, but if yours doesn't, you can avoid these problems just by using OAuth. OAuth tokens don't expire.
- No need to cache cookies: If your bot task is too frequent (say every 10 minutes), you're likely to have a high login rate unless you cache the login cookies and use them across bot runs. High login rates are [https://phabricator.wikimedia.org/T256533 frowned upon by server admins]. Again, with OAuth, you don't have to worry about this.
- More secure: OAuth is more secure.