User:Monkbot/Task 6: CS1 language support
Monkbot task 6 was created to modify CS1 citations that have {{para|title}} parameters containing non-Latin to use the new CS1 parameter {{para|script-title}}.
__TOC__
A recent change to Module:Citation/CS1 (the engine underlying the {{cs1}} templates) created a new parameter {{para|script-title}}. The new parameter is intended to be used when a citation's title is written in a script that is not a Latin-based alphabet. Usually these scripts should not be italicized (Chinese, Japanese, etc.) and/or may be written right-to-left (Hebrew, Persian, etc.). {{para|script-title}} is supported by all citation templates that use Module:Citation/CS1 except {{tlx|cite encyclopedia}}. As of revision b, task 6 does not modify {{tld|cite encyclopedia}} templates.
The purpose of the {{tld|xx icon}} templates is to identify for readers that certain links are to sources that are not English language sources. Each of these {{tld|xx icon}} templates adds the page to the appropriate subcategory of {{cl|Articles with non-English-language external links}}. Prior to the 11 October 2014 update to Module:Citation/CS1, CS1 templates with {{para|language}} parameters also added pages to the individual subcategories in Category:Articles with non-English-language external links. Because CS1 citations do not always provide links to external sources, citations that used {{para|language}} to identify the language in which the source is written were improperly categorizing the article. Module:Citation/CS1 now uses {{cl|CS1 foreign language sources}}. Task 6 locates CS1 citation templates that are adjacent to {{tld|xx icon}} templates, adds a {{para|language}} parameter with the language code from the {{tld|xx icon}} template to the CS1 citation and then deletes the {{tld|xx icon}} template.
Task 6 was initially created to work on pages listed in certain subcategories of Category:Articles with non-English-language external links. The criteria are: subcategories that contain 1,000 or more articles; or subcategories for languages that have a ISO639-1 two-character language code that are listed at right-to-left. The first was an arbitrary cutoff, the second was not.
Task 6 begins by changing {{tld|xx icon}} redirects to that standard form. For example, {{tlx|Da}}, {{tlx|Da li}}, {{tlx|Da-icon}}, and {{tlx|Dk icon}} are all redirects to and so are changed to {{tlx|da icon}}. The purpose of the standardization is to simplify later rules in the script.
After {{tld|xx icon}} standardization, task 6:
- protects certain {{tld|xx icon}} templates from further edits;
- moves {{tld|xx icon}} templates that are inside a CS1 citation template to a position ahead of the CS1 template for processing by later rules;
- removes empty {{para|language}} parameters from CS1 citations so that the citation doesn't end up with duplicate {{para|language}} parameters at the end of the task;
- removes wikilink markup from {{para|language}} parameter values so that Module:Citation/CS1 can properly categorize the citation;
removes {{para|language|English}}, {{para|language|British English}}, {{para|language|en}}, or {{para|language|en-GB}} from CS1 citations that use them.discontinued at task 6n;- from task 6n: modifies {{para|language|English language}}, {{para|language|British English}} to {{para|language|English}}; modifies {{para|language|en-GB}} to {{para|language|en}}
Some citations have {{para|language}} parameters that contain RFC1766-style language codes (code-subcode where code is an ISO639-1 language code and subcode is an ISO3166 country code. CS1 does not support this style of language parameter. Task 6 truncates these codes to just the ISO639-1 portion. Chinese is written in both simplified and traditional forms. Where {{para|language|simplified Chinese}} or {{para|language|traditional Chinese}} parameters occur, task 6 removes the qualifier. Where {{para|language}} contains a language name followed by the word language ({{para|language}German language}}), task 6 removes the qualifier.
In a CS1 citation, {{para|language}} may either precede or follow {{para|title}} with or without intervening parameters. To properly evaluate each citation then requires a rule for each case. Alternately, multiple rules are not needed if each citation is modified to a standard format. In this case, editors generally place {{para|language}} somewhere after {{para|title}}. Task 6 modifies those citation templates where {{para|language}} precedes {{para|title}} by moving {{para|language}} to the end of the citation (same place it puts {{para|language}} parameters that are created from {{tld|xx icon}} templates).
Certain citations shouldn't be edited. Task 6 employs a multilevel protection scheme. Edits to protected elements are prevented by the insertion of a special text string that makes the template unrecognizable to subsequent rules. Elements that include either of the special text strings __PROTECTED__
and __PROTECTED2__
, are never edited by task 6 except to remove the protection string at the task's completion. Reasons for this level of protection are:
- a citation with leading or trailing {{tld|xx icon}} templates contains {{para|language|
}} where the {{tld|xx icon}} code (xx) or the code's equivalent language name does not match the language name or code in {{para|language}}; where there is a match, {{tld|xx icon}} is removed; - the citation includes another template; especially templates like {{tlx|nihongo}} which can confuse the later rules;
- groups of two or more {{tld|xx icon}} or {{tld|xxx icon}} templates, the first and last are protected to prevent later rules from taking one of them as a value for a citation's {{para|language}} parameter.
- {{tlx|en icon}} when amongst other {{tld|xx icon}} or {{tld|xxx icon}} templates; it is presumed that such use indicates a multilingual source;
The second level of protection is applied only after the first level protection rules have been applied. This level identifies CS1 citations that have {{para|title}} values containing one or more Latin characters. The script is not smart enough to know if these characters are part of the original writing system, are a transliteration, or are a translation. Under certain circumstances described later, task 6 may edit those citations marked with __PROTECED1__
.
Unprotected {{tlx|en icon}} templates are then deleted.
For each of the rtl languages, the CJK languages, other non-Latin scripts (Greek, Hebrew, Cyrillic), and in keeping with MOS:Foriegn terms, special rules require that the content of {{para|title}} must match the language identified in {{tld|xx icon}} or {{para|language}}. For example, the rule for Arabic requires an {{tlx|ar icon}} or {{para|language|ar}} or {{para|language|Arabic}} and that {{para|title}} contain only punctuation, digits (0–9), and Arabic script. When these conditions are met, task 6 replaces {{para|title|...}} with {{para|script-title|ar:...}}, adds {{para|language|ar}} (if appropriate) and deletes the adjacent {{tld|ar icon}} template (if present).
Languages for which task 6 supports {{para|script-title}} are:
{{columns-list |colwidth=15em|
- Arabic (ar)
- Armenian (hy)
- Bosnian (bs)
- Chinese (zh)
- Greek (el)
- Hebrew (he)
- Japanese (ja)
- Korean (ko)
- Kurdish (ku)
- Maldivian (dv){{dagger}}
- Pashto (ps)
- Persian (fa)
- Russian (ru)
- Serbian (sr)
- Sindhi (sd)
- Thai (th)
- Ukranian (uk)
- Uyghur (ug)
- Yiddish (yi)
}}
{{dagger}} {{small|when {{para|language|divehi}}, {{para|language|dhivehi}}, {{para|language|maldivian}}, {{para|language|dv}}; when citation has adjacent {{tlx|dv icon}}, {{para|language}} parameter must be {{para|language|Maldivian}} or {{para|language|dv}};}}
For those languages that use Latin or Latin-variant alphabets, task 6 simply adds {{para|language|xx}} and deletes the adjacent {{tld|xx icon}} template.
Where those CS1 citations with Latin characters in {{para|title}}, and which now contain __PROTECTED1__
, task 6 deletes the icon and adds {{para|language|xx}} to the citation.
As a final step, wherever task 6 added __PROTECTED__
, __PROTECTED1__
, and __PROTECTED2__
, that text is removed.
From 18 April 2015 Module:Citation/CS1 supports a comma delimited list of language names. From Rev. o, task 6 will locate cs1|2 templates followed by two to five {{tld|xx icon}} templates and add the codes from those template to a {{para|language}} parameter.
Hidden under the hood at Module:Citation/CS1 is the process that takes {{para|title|transcription}}, {{para|script-title|xx:original writing system title}}, and {{para|trans-title|translated title}} and puts them all together with {{tag|bdi|params=lang="xx"}} which both isolates the content for rtl languages and helps the browser to correctly display the script.
If, at the end of all of this, only casing has been changed ({{tld|XX icon}} to {{tld|xx icon}}) then the change is not saved.
Article pages that contain {{tlx|bots|Monkbot 6}} or that do not contain Module:Citation/CS1-supported templates will not be edited by this task.
Ancillary tasks
This script also:
To do list
Script
// REVISIONS:
// 2014-11-13: Rev a:
// Detect and remove |language=British English
// |language=divehi or |language=dhivehi or |language=maldivian or |language=dv
// 2014-11-14: Rev b:
// remove support for cite encyclopedia; parameter remapping in Module:Citation/CS1 doesn't work because no |script-chapter
// 2014-11-14: Rev c:
// add support for Armenian (hy);
// 2014-11-15: Rev d:
// Mandarin and Cantonese dialects to Chinese; standard Chinese to Chinese;
// 2014-11-16: Rev e:
// Revise protection rule so CS1 templates with embedded templates are more correctly ignored;
// 2014-11-17: Rev f:
// Modify |language=Nynorsk to |language=Norwegian Nynorsk;
// 2014-11-17: Rev g:
// Add rule to remove empty |script-title= already in a citation;
// 2014-11-18: Rev h:
// Modify |language=Bokmål to |language=Norwegian Bokmål;
// 2014-11-18: Rev i:
// Modify |language=Português to |language=Portuguese;
// 2014-11-18: Rev j:
// Remove |language=English language;
// 2014-11-18: Rev k:
// Add rule to search previously edited pages for erroneous edits that may have placed |language=xx at the end of an embedded template; Use Category:CS1 uses foreign language script;
// 2015-04-26: Rev l:
// expand the number of rules that can use IS_CS1E; add cite arxiv, cite map, cite episode, cite serial;
// 2015-04-27: Rev m:
// remove support for cite episode; parameter remapping in Module:Citation/CS1 doesn't work because no |script-chapter
// 2015-08-26: Rev n:
// change variants of |language=english because the module now simply hides english annotation;
// 2015-08-28: Rev o:
// add multi-icon to language parameter; enable newsgroup and newspaper;
// 2019-06-10: Rev p:
// allow IETF-like language tags because cs1|2 accepts them
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Skip = true;
// Summary = "add |script-title=; replace {{xx icon}} with |language= in CS1 citations; clean up language icons;";
Summary = "Task 6p: add |script-title=; replace {{xx icon}} with |language= in CS1 citations; normalize language icons;";
string pattern; // local variable to hold regex pattern for reuse
string IS_CS1 = @"(?:[Cc]ite[_ ](?=(?:(?:AV|av) [Mm]edia(?: notes)?)|article|ar[Xx]iv|blog|book|conference|document|(?:DVD|dvd)(?: notes)?|interview|journal|letter|[Mm]agazine|map|news|news(?:group|paper)|paper|podcast|press release|serial|sign|speech|techreport|thesis|video|web)|[Cc]itation|[Cc]ite(?=\s*\|))";
string IS_CS1E = @"(?:[Cc]ite[_ ](?=(?:(?:AV|av) [Mm]edia(?: notes)?)|article|ar[Xx]iv|blog|book|conference|document|(?:DVD|dvd)(?: notes)?|encyclopa?edia|episode|interview|journal|letter|[Mm]agazine|map|news|news(?:group|paper)|paper|podcast|press release|serial|sign|speech|techreport|thesis|video|web)|[Cc]itation|[Cc]ite(?=\s*\|))";
string IS_CJK = @"\p{IsHangulSyllables}\p{IsCJKUnifiedIdeographs}\p{IsHalfwidthandFullwidthForms}\p{IsCJKSymbolsandPunctuation}\p{IsHiragana}\p{IsKatakana}";
string IS_DIGITS_AND_SYMBOLS = @"\d\p{P}~\$\^\+`\=\|\<\>";
string IS_ARABIC_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsArabic}]+"; // Arabic, Pashto, Uyghur
string IS_ARMENIAN_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsArmenian}]+"; // Arabic, Pashto, Uyghur
string IS_CJK_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[" + IS_CJK + @"]+"; // Chinese, Japanese, Korean
string IS_CYRILLIC_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsCyrillic}\p{IsCyrillicSupplement}]+"; // Bosnian, Russian, Serbian, Ukrainian
string IS_GREEK_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsGreek}]+"; // Greek
string IS_HEBREW_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsHebrew}]+"; // Hebrew, Yiddish
string IS_PERSIAN_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsArabic}\p{IsHebrew}\p{IsCyrillic}\p{IsCyrillicSupplement}]+"; // Persian
string IS_SINDHI_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsArabic}\p{IsDevanagari}]+"; // Sindhi
string IS_THAANA_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsThaana}]+"; // Maldivian
string IS_THAI_SCRIPT = @"[" + IS_DIGITS_AND_SYMBOLS + @"]*[\p{IsThai}]+"; // Thai
Dictionary
language_map.Add("ar", "arabic"); // Arabic
language_map.Add("bs", "bosnian"); // Cyrillic
language_map.Add("ca", "catalan");
language_map.Add("cs", "czech");
language_map.Add("da", "danish");
language_map.Add("de", "german");
language_map.Add("dv", "maldivian"); // TODO: do special case for this? mediawiki doesn't recognize malvidian nor dhivehi but does recognize divehi
language_map.Add("el", "greek"); // Greek
language_map.Add("es", "spanish");
language_map.Add("fa", "persian"); // Arabic, Cyrillic, Hebrew
language_map.Add("fi", "finnish");
language_map.Add("fr", "french");
language_map.Add("he", "hebrew");
language_map.Add("hr", "croatian");
language_map.Add("hu", "hungarian");
language_map.Add("hy", "armenian");
language_map.Add("id", "indonesian");
language_map.Add("it", "italian");
language_map.Add("ja", "japanese");
language_map.Add("ko", "korean");
language_map.Add("ku", "kurdish");
language_map.Add("lt", "lithuanian");
language_map.Add("nl", "dutch");
language_map.Add("no", "norwegian");
language_map.Add("pl", "polish");
language_map.Add("ps", "pashto"); // Arabic*
language_map.Add("pt", "portuguese");
language_map.Add("ro", "romanian");
language_map.Add("ru", "russian"); // Cyrillic*
language_map.Add("sd", "sindhi"); // Arabic, Devanagari
language_map.Add("sk", "slovak");
language_map.Add("sl", "slovenian");
language_map.Add("sr", "serbian"); // Cyrillic
language_map.Add("sv", "swedish");
language_map.Add("th", "thai");
language_map.Add("tr", "turkish");
language_map.Add("ug", "uyghur"); // Arabic
language_map.Add("uk", "ukrainian"); // Cyrillic
language_map.Add("yi", "yiddish"); // Hebrew
language_map.Add("zh", "chinese");
Dictionary
spelling_map.Add("Belorussian", "Belarusian");
spelling_map.Add("Castilan", "Spanish");
spelling_map.Add("Germaan", "German");
spelling_map.Add("Norwegain", "Norwegian");
spelling_map.Add("Portuguese (Brazil)", "Portuguese");
//---------------------------< R E P L A C E R E D I R E C T S >--------------------------------------------
// ARABIC: Replace {{AR}}, {{AR icon}} with {{ar icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:AR|(?:AR|[Aa]r) icon)\}\}", "{{ar icon}}");
// CATALAN: Replace {{Ca}}, {{Ca li}} with {{ca icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Cc]a|[Cc]a li|Ca icon)\}\}", "{{ca icon}}");
// CHINESE: Replace {{cn icon}}, {{zh-icon}} with {{zh icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Cc]n icon|[Zz]h[ \-]icon)\}\}", "{{zh icon}}");
// CROATIAN: Replace {{Hr li}} with {{hr icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Hh]r li|Hr icon)\}\}", "{{hr icon}}");
// CZECH: Replace {{Cs li}}, {{Cz}}, {{Cz icon}} with {{cs icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Cc]s li|[Cc]z|[Cc]z icon|Cs icon)\}\}", "{{cs icon}}");
// DANISH: Replace {{Da}}, {{Da li}}, {{Da-icon}}, {{Dk icon}} with {{da icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Dd]a|[Dd]a li|[Dd]a[ \-]icon|[Dd]k icon)\}\}", "{{da icon}}");
// ENGLISH: Replace {{En li}}, {{En-icon}}, {{Ref-en}}
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ee]n icon|[Ee]n li|[Ee]n\-icon|[Rr]ef-en)\}\}", "{{en icon}}");
// FINNISH: Replace {{Fi}}, {{Fi li}} with {{fi icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ff]i|[Ff]i li|Fi icon)\}\}", "{{fi icon}}");
// FRENCH: Replace {{Fr icon}}, {{Fr}}, {{fr}}, {{French icon}}, {{FR-icon}}, {{Fr li}}, {{Fr-icon}}, {{Ref-fr}} with {{fr icon}}. {{FR}} is a redirect to {{FRA}}, a flag template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ff]r icon|[Ff]r|[Ff]rench icon|FR-icon|[Ff]r li|[Rr]ef-fr)\}\}", "{{fr icon}}");
// GERMAN: Replace {{De li}}, {{De-icon}}, {{Ger}}, {{ger}}, {{Icon de}}, {{Ref-de}} with {{de icon}}. {{GER}} is a redirect to {{DEU}}, a flag template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Dd]e li|[Dd]e[ \-]icon|[Gg]er|[Ii]con de|[Rr]ef\-de)\}\}", "{{de icon}}");
// GREEK: Replace {{El}}, {{el}}, {{El icon}}, {{Gr icon}}, {{Gre icon}} with {{el icon}}. {{EL}} is a redirect to {{External links}}, a maintenance template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ee]l|[Ee]l icon|[Gg]r icon|[Gg]re icon)\}\}", "{{el icon}}");
// HUNGARIAN: Replace {{Hu}}, {{Hu li}}, {{Ref-hu}} with {{hu icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Hh]u|[Hh]u li|[Rr]ef\-hu|Hu icon)\}\}", "{{hu icon}}");
// INDONESIAN: Replace {{Id}}, {{Id li}}, {{Indonesian}}, {{Indonesian icon}} with {{id icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ii]d|[Ii]d li|[Ii]ndonesian|[Ii]ndonesian icon|Id icon)\}\}", "{{id icon}}");
// ITALIAN: Replace {{It li}}, {{It}} with {{it icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ii]t li|[Ii]t|It icon)\}\}", "{{it icon}}");
// JAPANESE: Replace {{Jp-icon}}, {{Ja}}, {{Ja li}}, {{Ja-icon}}, {{Jp icon}}, {{Jp language}} with {{ja icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Jj]a icon|[Jj]p\-icon|[Jj]a|[Jj]a li|[Jj]a\-icon|[Jj]p icon|[Jj]p language)\}\}", "{{ja icon}}");
// KOREAN: Replace {{Ko}} with {{ko icon}}. {{KO}} is a used for something else
ArticleText = Regex.Replace (ArticleText, @"\{\{[Kk]o(?: icon)?\}\}", "{{ko icon}}");
// LITHUANIAN: Replace {{Lt li}}, {{Lticon}} with {{lt icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ll]t li|[Ll]ticon|Lt icon)\}\}", "{{lt icon}}");
// DUTCH (NETHERLANDS): Replace {{Du icon}}, {{Nl}}, {{Nl li}}, {{Nl-icon}} with {{nl icon}}. {{NL}} is used as a flag template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Dd]u icon|[Nn]l|[Nn]l li|[Nn]l[ \-]icon)\}\}", "{{nl icon}}");
// NORWEGIAN: Replace {{No-icon}} with {{no icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{[Nn]o[ \-]icon\}\}", "{{no icon}}");
// PERSIAN: Replace {{fa}} and {{pr icon}} with {{fa icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ff]a|[Pp]r icon|Fa icon)\}\}", "{{fa icon}}");
// POLISH: Replace {{Pl}}, {{pl}}, {{Pl li}}, {{Pl-icon}} with {{pl icon}}. {{PL}} is a redirect to Plainlist
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Pp]l|[Pp]l li|[Pp]l[ \-]icon)\}\}", "{{pl icon}}");
// PORTUGUESE: Replace {{Pt}}, {{Pt li}} with {{pt icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Pp]t|[Pp]t li|Pt icon)\}\}", "{{pt icon}}");
// ROMANIAN: Replace {{Ref-ro}}, {{Ro}}, {{Ro li}}, {{Ro-icon}} with {{ro icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Rr]ef-ro|[Rr]o|[Rr]o li|[Rr]o[ \-]icon)\}\}", "{{ro icon}}");
// RUSSIAN: Replace {{Ru li}}, {{Icon ru}}, {{Ref-ru}}, {{Ru Icon}}, {{Ru language}}, {{Ru-icon}} with {{ru icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Rr]u li|[Ii]con ru|[Rr]ef-ru|[Rr]u Icon|[Rr]u language|[Rr]u-icon)\}\}", "{{ru icon}}");
// SERBIAN: Replace {{SR icon}}, {{Sr li}} with {{sr icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:(?:[Ss]r|SR) icon|[Ss]r li)\}\}", "{{sr icon}}");
// SINDHI: Replace {{Sd}} with {{sd icon}}. {{SD}} is a speedy delete template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ss]d|Sd icon)\}\}", "{{sd icon}}");
// SLOVAK: Replace {{Sk}} with {{sk icon}}. {{SK}} is a flag template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ss]k|Sk icon)\}\}", "{{sk icon}}");
// SLOVENIAN: Replace {{Sl}}, {{sl}}, {{Sl li}}, {{Slovene}} with {{sl icon}}. {{SL}} is a redirect to Subscription or libraries template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ss]l|[Ss]l li|[Ss]lovene|Sl icon)\}\}", "{{sl icon}}");
// SPANISH: Replace {{Es-icon}}, ((Sp icon}}, {{Es}}, {{Es li}} with {{es icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ee]s[ \-]icon|[Ss]p icon|[Ee]s|[Ee]s li)\}\}", "{{es icon}}");
// SWEDISH: Replace {{Sv}}, {{sv}}, {{Svenska}}, {{Svicon}}, {{Swe icon}} with {{sv icon}}. {{SV}} is a ship prefix template
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Ss]v|[Ss]venska|[Ss]vicon|[Ss]we icon|Sv icon)\}\}", "{{sv icon}}");
// THAI: Replace {{Th icon}} with {{th icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{Th icon\}\}", "{{th icon}}");
// TURKISH: Replace {{TR}}, {{Tr}}, {{Tr li}} with {{tr icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:TR|[Tt]r|Tr icon|Tr li)\}\}", "{{tr icon}}");
// UKRANIAN: Replace {{Uk li}}, {{Ref-uk}}, {{Ua icon}} with {{uk icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Uu]k li|[Rr]ef-uk|[Uu]a icon|Uk icon)\}\}", "{{uk icon}}");
// YIDDISH: Replace {{Yi}} with {{yi icon}}.
ArticleText = Regex.Replace (ArticleText, @"\{\{(?:[Yy]i|Yi icon)\}\}", "{{yi icon}}");
// OTHERS: Replace these redirects for completeness: {{Ref-az}}, {{Ref-be}}, {{Ref-hy}}, {{Ref-uz}}
ArticleText = Regex.Replace (ArticleText, @"\{\{[Rr]ef-((?:az|be|hy|uz))\}\}", "{{$1 icon}}");
// OTHERS: Replace these redirects for completeness:
// {{Af li}}, {{Ba li}}, {{Be li}}, {{Bg li}}, {{Br li}}, {{Et li}}, {{Eu li}}, {{Ga li}}, {{Gd li}},
// {{Gn li}},{{Is li}}, {{Ka li}}, {{Ln li}}, {{Mg li}}, {{Ms li}}, {{Qu li}}, {{Tl li}}, {{Vi li}}
ArticleText = Regex.Replace (ArticleText, @"\{\{((?:[Aa]f|[Bb]a|[Bb]e|[Bb]g|[Bb]r|[Ee]t|[Ee]u|[Gg]a|[Gg]d|[Gg]n|[Ii]s|[Kk]a|[Ll]n|[Mm]g|[Mm]s|[Qq]u|[Tt]l|[Vv]i)) li\}\}",
delegate(Match match)
{
return @"{{" + match.Groups[1].Value.ToLower() + @" icon}}"; // set language code portion to lower case
});
// OTHERS: set mixed and upper case codes in {{xx icon}} templates to lower case for completeness also remove hyphens: {{Xx-icon}} and {{XX-icon}} to {{xx icon}}
ArticleText = Regex.Replace (ArticleText, @"\{\{([a-zA-Z]{2})[\s-]icon\}\}",
delegate(Match match)
{
return @"{{" + match.Groups[1].Value.ToLower() + @" icon}}"; // set language code portion to lower case
});
//---------------------------< P R O T E C T I C O N S >----------------------------------------------------
// these rules support ISO639-2, 3, etc three-character codes: {{xxx icon}}
// ICON GROUPS: Protect {{xx icons}} when there are two of them separated by ' and ': {{xx icon}} and {{xx icon}} is changed to:
// {{__PROTECTED__xx icon}} and {{xx icon__PROTECTED__}}
// This rule prevents later rules from moving the first or last of an icon group into |language=
ArticleText = Regex.Replace(ArticleText, @"(\{\{)([a-z]{2,3}\s*icon\}\}\s*and\s*\{\{[a-z]{2,3}[\s-]icon)(\}\})", "$1__PROTECTED__$2__PROTECTED__$3");
// ICON GROUPS: Protect {{xx icons}} when there are multiples of them: {{xx icon}} {{xx icon}} {{xx icon}} is changed to:
// {{__PROTECTED__xx icon}} {{xx icon}} {{xx icon__PROTECTED__}}
// This rule prevents later rules from moving the first or last of an icon group into |language=
ArticleText = Regex.Replace(ArticleText, @"(\{\{)([a-z]{2,3}\s*icon\}\}(?:\s*[,;/–-]?\s*&?\s*\{\{[a-z]{2,3}[\s-]icon\}\})*\s*[,;/–-]?\s*&?\s*\{\{[a-z]{2,3}[\s-]icon)(\}\})", "$1__PROTECTED__$2__PROTECTED__$3");
// ENGLISH ICON: Protect {{en icon}} when it is in a group of icons but is not one of the end icons
// This rule prevents the delete {{en icon}} rule from deleting {{en icon}} when it is a member of a group of icons. When in a group,
// if {{en icon}} is not one of the end icons, it always follows another so a rule for an {{en icon}} preceding {{xx icon}} is not necessary.
ArticleText = Regex.Replace(ArticleText, @"([a-z]{2,3}\s*icon\}\}\s*[,;/–-]?\s*\{\{)(en\s*icon\}\})", "$1__PROTECTED__$2");
//---------------------------< R E M O V A L S >--------------------------------------------------------------
// INSIDE ICONS: Find {{xx icon}} templates inside a CS1 citation template. Move {{xx icon}} ahead of the citation so it can be processed by later rules
// Doesn't find inside {{xx icon}} templates if the citation also has other templates ahead of {{xx icon}}
pattern = @"(\{\{\s*" + IS_CS1E + @"[^\{\}]+)(\{\{\w{2,2}\s*icon\s*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$2$1");
Skip = false;
}
// LANGUAGE MAGIC WORDS: Find {{#language:xx|xx}} magic words inside a CS1 citation template. Remove all but language code. Assume associated with |language=
// Doesn't find inside {{#language:xx}} if the citation also has other templates ahead of {{#language:xx}}
pattern = @"(\{\{\s*" + IS_CS1E + @"[^\{\}]*\|\s*language\s*=\s*)\{\{#language:([a-zA-Z]{2})[^\}]*\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
Skip = false;
}
// EMPTY PARAMETERS: Remove empty |language= parameters so we don't end up with two. This rule follows the INSIDE ICONS rule so that newly emptied |language= is removed.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1E + @"[^\}]*)\|\s*language\s*=\s*([\|\}])", "$1$2");
// EMPTY PARAMETERS: Remove empty |script-title= parameters so we don't end up with two.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)\|\s*script-title\s*=\s*([\|\}])", "$1$2");
// WIKILINKS: Remove simple wikilinks from |language parameters because they prevent proper categorization
// Replace Text with Text
pattern = @"(\{\{\s*" +IS_CS1E + @"[^\}]*\|\s*language\s*=\s*)\[\[([A-Za-z\s]+)\]\]";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
Skip = false;
}
// WIKILINKS: Remove complex wikilinks from |language parameters because they prevent proper categorization
// Replace Text with Text
pattern = @"(\{\{\s*" +IS_CS1E + @"[^\}]*\|\s*language\s*=\s*)\[\[[A-Za-z\s]+\|([A-Za-z\s]+)\]\]";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
Skip = false;
}
// this rule disabled and replaced by the next rules because the module simply hides english annotation
// ENGLISH: Remove |language=English, |language=en, |language=Eng, and |language=en-GB |language=British English parameters.
// pattern = @"({{\s*" + IS_CS1E + @"[^}]+)\|\s*language\s*=\s*(?:[Ee]nglish|[Bb]ritish [Ee]nglish|en\-[a-zA-Z]+|EN|[Ee]ng?)\s*([\|\}])";
// if (Regex.Match (ArticleText, pattern).Success)
// {
// ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// Skip = false;
// }
// ENGLISH: Replace |language=en-XX with en
// disabled 2019-06-10 because cs1|2 ignores everything after the language code in IETF-like tags
// pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*en)\-[a-zA-Z]+\s*([\|\}])";
// if (Regex.Match (ArticleText, pattern).Success)
// {
// ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// Skip = false;
// }
// ENGLISH: Replace |language=Eng with en
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Ee]ng\.?(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1en$2");
// Skip = false; // not sufficient change to save an article
}
// ENGLISH: Replace |language=British English with English.
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Bb]ritish [Ee]nglish(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1English$2");
Skip = false;
}
// this rule disabled and replaced with next rule because the module simply hides english
// ENGLISH: Remove |language=English language
// pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Ee]nglish\s*[Ll]anguage(\s*[\|\}])";
// if (Regex.Match (ArticleText, pattern).Success)
// {
// ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// Skip = false;
// }
// ENGLISH: Replace |language=English language with English.
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*[Ee]nglish)\s*[Ll]anguage\s*([\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2$3");
Skip = false;
}
//---------------------------< M I S C M O D I F I C A T I O N S >------------------------------------------
// SUBCODES: Change |language=xx-XX (language code - subcode pairs) to |language=xx
// disabled 2019-06-10 because cs1|2 ignores everything after the language code in IETF-like tags
// pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)([a-zA-Z]{2})\-[a-zA-Z]+(\s*[\|\}])";
// if (Regex.Match (ArticleText, pattern).Success)
// {
// ArticleText = Regex.Replace(ArticleText, pattern, "$1$2$3");
// Skip = false;
// }
// CHINESE: Change |language=simplified (or standard or traditional) Chinese to |language=Chinese
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)(?:[Ss]implified|[Ss]tandard|[Tt]raditional)\s*Chinese(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Chinese$2");
Skip = false;
}
// CHINESE: Change |language=traditional Chinese to |language=Chinese
// pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Tt]raditional\s*Chinese(\s*[\|\}])";
// if (Regex.Match (ArticleText, pattern).Success)
// {
// ArticleText = Regex.Replace(ArticleText, pattern, "$1Chinese$2");
// Skip = false;
// }
// CHINESE: Change |language=Mandarin and |language=Cantonese (dialects) to |language=Chinese
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)(?:[Cc]antonese|[Mm]andarin)(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Chinese$2");
Skip = false;
}
// JAPANESE: Change |language=Japan to |language=Japanese
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Jj]apan(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Japanese$2");
Skip = false;
}
// JAPANESE: Change |language=Japanese – Shift-JIS (or other extraneous text) to |language=Japanese
// pattern = @"({{\s*" + IS_CS1 + @"[^}]+\|\s*language\s*=\s*)[Jj]apanese[^\|\}]*(\s*[\|\}])";
// if (Regex.Match (ArticleText, pattern).Success)
// {
// ArticleText = Regex.Replace(ArticleText, pattern, "$1Japanese$2");
// Skip = false;
// }
// NEDERLANDS: Change |language=Nederlands to |language=Dutch
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)(?:[Nn]ederlands|NL)(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Dutch$2");
Skip = false;
}
// NORWEGIAN BOKMÅL: Change |language=Bokmål to |language=Norwegian Bokmål
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Bb]okmål(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Norwegian Bokmål$2");
Skip = false;
}
// NORWEGIAN NYNORSK: Change |language=Nynorsk to |language=Norwegian Nynorsk
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Nn]ynorsk(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Norwegian Nynorsk$2");
Skip = false;
}
// PORTUGUÊS: Change |language=Português, |language=Portugeas to |language=Portuguese
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)(?:[Pp]ortuguês|[Pp]ortugeas)(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Portuguese$2");
Skip = false;
}
// SLOVENE: Change |language=Slovene to |language=Slovenian
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)[Ss]lovene(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1Slovenian$2");
Skip = false;
}
// OTHERS: Change |language=
pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)([a-zA-Z]+)\s*[Ll]anguage(\s*[\|\}])";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2$3");
Skip = false;
}
// MISSPELLINGS: Fix misspellings in |language=
/* pattern = @"({{\s*" + IS_CS1E + @"[^}]+\|\s*language\s*=\s*)([^\|\}]*)";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string new_spelling;
string return_string = @"RAW_MATCH " + match.Groups[0].Value; // no misspelling, return the raw string
try // get icon code's language name from dictionary
{
new_spelling = spelling_map[match.Groups[2].Value]; // will throw an exception if misspelled language
}
catch (KeyNotFoundException) // trap the exception
{
return return_string; // return the raw string
}
return @"FIXED " + match.Groups[1].Value + new_spelling;
});
Skip = false;
}
- /
/* this worked
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string new_spelling;
string return_string = @"RAW_MATCH " + match.Groups[0].Value; // no misspelling, return the raw string
try // get icon code's language name from dictionary
{
new_spelling = spelling_map[match.Groups[2].Value]; // will throw an exception if misspelled language
}
catch (KeyNotFoundException) // trap the exception
{
return return_string; // return the raw string
}
return @"FIXED " + match.Groups[1].Value + new_spelling;
});
Skip = false;
}
- /
//---------------------------< P R O T E C T E D 2 >----------------------------------------------------------
// Here we protect {{xx icon}} when it is paired with a citation having |language=
// in {{xx icon}} matches
// as superfluous. Otherwise, we can't know which, {{xx icon}} or |language=
// {{xx icon}}. The delegate functions compare icon language code to
// to
// TODO: do special case for malvidian, dhivehi, divehi? mediawiki doesn't recognize malvidian nor dhivehi but does recognize divehi
// LANGUAGE PARAMETER: Protect icons that follow citations having |language=
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1E + @"[^}]*\|\s*language\s*=\s*)([^\|\}\s]*)([^\}]*\}\}\s*)\{\{([a-zA-Z]{2})(\s+icon\}\})",
delegate(Match match)
{
string icon_lang;
string return_string = match.Groups[1].Value + match.Groups[2].Value + match.Groups[3].Value+ @"{{__PROTECTED2__" + match.Groups[4].Value + match.Groups[5].Value;
try // get icon code's language name from dictionary
{
icon_lang = language_map[match.Groups[4].Value]; // will throw an exception if icon code (key) is not found in dictionary
}
catch (KeyNotFoundException) // trap the exception
{
return return_string; // return a protected icon followed by the citation
}
// case insensitive string compare; compare code to code and name to name
if ((0 == String.Compare (match.Groups[2].Value, match.Groups[4].Value, true)) || (0 == String.Compare (icon_lang, match.Groups[2].Value, true)))
return match.Groups[1].Value + match.Groups[2].Value + match.Groups[3].Value; // matched so remove the icon
else
return return_string; // no match, protect the icon
});
// LANGUAGE PARAMETER: Protect icons that precede citations having |language=
ArticleText = Regex.Replace(ArticleText, @"\{\{([a-zA-Z]{2})\s+icon\}\}\s*(\{\{\s*" + IS_CS1E + @"[^}]*\|\s*language\s*=\s*)([^\|\}\s]*)",
delegate(Match match)
{
string icon_lang;
string return_string = @"{{__PROTECTED2__" + match.Groups[1].Value + @" icon}}" + match.Groups[2].Value + match.Groups[3].Value;
try // get icon code's language name from dictionary
{
icon_lang = language_map[match.Groups[1].Value]; // will throw an exception if icon code (key) is not found in dictionary
}
catch (KeyNotFoundException) // trap the exception
{
return return_string; // return a protected icon followed by the citation
}
// case insensitive string compare; compare code to code and name to name
if ((0 == String.Compare (match.Groups[1].Value, match.Groups[3].Value, true)) || (0 == String.Compare (icon_lang, match.Groups[3].Value, true)))
return match.Groups[2].Value + match.Groups[3].Value; // matched so remove the icon
else
return return_string; // no match, protect the icon
});
//---------------------------< P R O T E C T I O N >----------------------------------------------------------
// INCLUDED TEMPLATES: Protect any citations that contain other templates except {{xx icon}} templates. Matches any embedded template.
// By the time we get here, embedded {{xx icon}} templates that could be removed have been removed by the INSIDE ICONS rule.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*)(" + IS_CS1E + @"[^\{\}]*\{\{[^\}]*\}\})", "$1__PROTECTED__$2");
//---------------------------< P R O T E C T E D 1 >----------------------------------------------------------
//
// This is a semi protection. There are later rules that edit citations with __PROTECTED1__
// This rule protects citations that contain Latin characters in |title=. Titles with Latin characters might be a mix of
// some script and English which might represent original writing system plus translation and/or transliteration. Such titles
// are too complicated for simple regex fixes so are protected. Some of these |title= parameters are wrapped in
// reason why isn't clear.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*)(" + IS_CS1E + @"[^}]*\|\s*title\s*=(?:\s*
//---------------------------< I C O N D E L E T I O N >----------------------------------------------------
//
// These rules delete unprotected English icons
//
// DELETE: ENGLISH: Remove {{en icon}} when not protected. This version when NOT at end of line include trailing space characters
ArticleText = Regex.Replace(ArticleText, @"\{\{(?:en icon|En li|En-icon|Ref-en)\}\} *([^\n])", "$1");
// DELETE: ENGLISH: Remove {{en icon}} when not protected. This version when at end of line; include leading and trailing space characters
ArticleText = Regex.Replace(ArticleText, @" *\{\{(?:en icon|En li|En-icon|Ref-en)\}\} *(\n)", "$1");
//---------------------------< P A R A M E T E R R E P O S I T I O N >--------------------------------------
// LANGUAGE: |language= may occur ahead of |title=; when it does, move it to the end of the citation before the closing }}
// This rule saves us the trouble of creating and maintaining duplicates of some of the following rules.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1 + @"[^\}]*)(\|\s*language\s*=\s*[^\|\}]*)([^\}]*)(\|\s*title\s*=[^\}]*)(\}\})", "$1$3$4$2$5");
//---------------------------< S C R I P T - T I T L E S >----------------------------------------------------
//
// These rules replace |title with an appropriate |script-title=, add the correct |language= parameter, and delete the adjacent {{xx icon}} template.
// All CS1 templates except {{cite encyclopedia}} which will require Module:Citation/CS1 support for |script-chapter=
//
// ARABIC and KURDISH, PASHTO, UYGHUR when written in Arabic. Find citations where |title= is in Arabic
// and the citation is followed by an {{ar icon}}, {{ku icon}}, {{ps icon}}, or {{ug icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARABIC_SCRIPT + @")([^\}]*)(\}\})\s*\{\{((?:ar|ku|ps|ug)) icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=$5:$2$3|language=$5$4");
Skip = false;
}
// ARABIC and KURDISH, PASHTO, UYGHUR when written in Arabic. Find citations where |title= is in Arabic
// and the citation is preceded by an {{ar icon}}, {{ku icon}}, {{ps icon}}, or {{ug icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"\{\{((?:ar|ku|ps|ug)) icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARABIC_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$2script-title=$1:$3$4|language=$1$5");
Skip = false;
}
// ARABIC, KURDISH, PASHTO, UYGHUR: Find citations where |title= is in Arabic and the citation contains |language=ar or (ku, ps, ug).
// Replace |title= with |script-title=xx:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARABIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*((?:ar|ku|ps|ug)))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=$4:$2$3$5");
Skip = false;
}
// ARABIC: Find citations where |title= is in Arabic and the citation contains |language=Arabic. Replace |title= with |script-title=ar:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARABIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*[Aa]rabic)([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=ar:$2$3$4");
Skip = false;
}
// KURDISH: Find citations where |title= is in Arabic and the citation contains |language=Kurdish. Replace |title= with |script-title=ku:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARABIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*[Kk]urdish)([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=ku:$2$3$4");
Skip = false;
}
// PASHTO: Find citations where |title= is in Arabic and the citation contains |language=Pashto. Replace |title= with |script-title=ps:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARABIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*[Pp]ashto)([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=ps:$2$3$4");
Skip = false;
}
// UYGHUR: Find citations where |title= is in Arabic and the citation contains |language=Uyghur. Replace |title= with |script-title=ug:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARABIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*[Uu]yghur)([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=ug:$2$3$4");
Skip = false;
}
// ARMENIAN: Find citations where |title= is in Armenian and the citation is followed by {{hy icon}} template.
// Replace |title= with |script-title=hy:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARMENIAN_SCRIPT + @")([^\}]*)(\}\})\s*\{\{hy icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=hy:$2$3|language=hy$4");
Skip = false;
}
// ARMENIAN: Find citations where |title= is in Armenian and the citation is preceded by {{hy icon}} template.
// Replace |title= with |script-title=hy:
pattern = @"\{\{hy icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARMENIAN_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=hy:$2$3|language=hy$4");
Skip = false;
}
// ARMENIAN: Find citations where |title= is in Armenian and the citation contains |language=hy or |language=Armenian
// Replace |title= with |script-title=hy:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_ARMENIAN_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Aa]rmenian|[Hh]y))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=hy:$2$3$4");
Skip = false;
}
// CHINESE, JAPANESE, and KOREAN: Find citations where |title= is in CJK and the citation is followed by {{ja icon}}, {{ko icon}}, or {{zh icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CJK_SCRIPT + @")([^\}]*)(\}\})\s*\{\{((?:ja|ko|zh)) icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=$5:$2$3|language=$5$4");
Skip = false;
}
// CHINESE, JAPANESE, and KOREAN: Find citations where |title= is in CJK and the citation is preceded by {{ja icon}}, {{ko icon}}, or {{zh icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"\{\{((?:ja|ko|zh)) icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CJK_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$2script-title=$1:$3$4|language=$1$5");
Skip = false;
}
// CHINESE: Find citations where |title= is in CJK and the citation contains |language=zh or |language=Chinese. Replace |title= with |script-title=zh:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CJK_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Zz]h|[Cc]hinese))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=zh:$2$3$4");
Skip = false;
}
// JAPANESE: Find citations where |title= is in CJK and the citation contains |language=ja or |language=Japanese. Replace |title= with |script-title=ja:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CJK_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Jj]a|[Jj]apanese))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=ja:$2$3$4");
Skip = false;
}
// KOREAN: Find citations where |title= is in CJK and the citation contains |language=ko or |language=Korean. Replace |title= with |script-title=ko:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CJK_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Kk]o|[Kk]orean))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=ko:$2$3$4");
Skip = false;
}
// GREEK: Find citations where |title= is in Greek and the citation is followed by {{el icon}} template.
// Replace |title= with |script-title=el:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_GREEK_SCRIPT + @")([^\}]*)(\}\})\s*\{\{el icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=el:$2$3|language=el$4");
Skip = false;
}
// GREEK: Find citations where |title= is in Greek and the citation is preceded by {{el icon}} template.
// Replace |title= with |script-title=el:
pattern = @"\{\{el icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_GREEK_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=el:$2$3|language=el$4");
Skip = false;
}
// GREEK: Find citations where |title= is in Greek and the citation contains |language=el or |language=Greek or |language=
// where
// Replace |title= with |script-title=el:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_GREEK_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:(?:[Aa]ncient |[Bb]yzantine |[Mm]ycenaean )?[Gg]reek|[Ee]l))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=el:$2$3$4");
Skip = false;
}
// HEBREW and YIDDISH: Find citations where |title= is in Hebrew or Yiddish and the citation is followed by an {{he icon}} or {{yi icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_HEBREW_SCRIPT + @")([^\}]*)(\}\})\s*\{\{((?:he|yi)) icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=$5:$2$3|language=$5$4");
Skip = false;
}
// HEBREW and YIDDISH: Find citations where |title= is in Hebrew or Yiddish and the citation is preceded by an {{he icon}} or {{yi icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"\{\{((?:he|yi)) icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_HEBREW_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$2script-title=$1:$3$4|language=$1$5");
Skip = false;
}
// HEBREW: Find citations where |title= is in Hebrew and the citation contains |language=he or |language=Hebrew.
// Replace |title= with |script-title=he:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_HEBREW_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Hh]ebrew|[Hh]e))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=he:$2$3$4");
Skip = false;
}
// YIDDISH: Find citations where |title= is in Hebrew and the citation contains |language=Yi or |language=Yiddish.
// Replace |title= with |script-title=he:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_HEBREW_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Yy]iddish|[Yy]i))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=yi:$2$3$4");
Skip = false;
}
// MALDIVIAN: Find citations where |title= is in Maldivian (Thaana) and the citation is followed by {{dv icon}} template.
// Replace |title= with |script-title=dv:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_THAANA_SCRIPT + @")([^\}]*)(\}\})\s*\{\{dv icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=dv:$2$3|language=dv$4");
Skip = false;
}
// MALDIVIAN: Find citations where |title= is in Maldivian (Thaana) and the citation is preceded by {{dv icon}} template.
// Replace |title= with |script-title=dv:
pattern = @"\{\{dv icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_THAANA_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=dv:$2$3|language=dv$4");
Skip = false;
}
// MALDIVIAN: Find citations where |title= is in Maldivian (Thaana) and the citation contains |language=dv or |language=Maldivian |language=divehi.
// Replace |title= with |script-title=dv:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_THAANA_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Mm]aldivian|[Dd]v||[Dd]h?ivehi))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=dv:$2$3$4");
Skip = false;
}
// PERSIAN: Find citations where |title= is in Arabic, Cyrillic, and/or Hebrew and the citation is followed by an {{fa icon}} template.
// Replace |title= with |script-title=fa:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_PERSIAN_SCRIPT + @")([^\}]*)(\}\})\s*\{\{fa icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=fa:$2$3|language=fa$4");
Skip = false;
}
// PERSIAN: Find citations where |title= is in Arabic, Cyrillic, and/or Hebrew and the citation is preceded by an {{fa icon}} template.
// Replace |title= with |script-title=fa:
pattern = @"\{\{fa icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_PERSIAN_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=fa:$2$3|language=fa$4");
Skip = false;
}
// PERSIAN: Find citations where |title= is in Arabic, Cyrillic, and/or Hebrew and the citation contains |language=fa or |language=Persian.
// Replace |title= with |script-title=fa:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_PERSIAN_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Pp]ersian|[Ff]a))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=fa:$2$3$4");
Skip = false;
}
// RUSSIAN, BOSNIAN, SERBIAN, UKRAINIAN: Find citations where |title= is in Cyrillic and the citation is followed by an {{ru icon}}, {{bs icon}}, {{sr icon}}, or {{uk icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CYRILLIC_SCRIPT + @")([^\}]*)(\}\})\s*\{\{((?:ru|bs|sr|uk)) icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=$5:$2$3|language=$5$4");
Skip = false;
}
// RUSSIAN, BOSNIAN, SERBIAN, UKRAINIAN: Find citations where |title= is in Cyrillic and the citation is preceded by an {{ru icon}}, {{bs icon}}, {{sr icon}}, or {{uk icon}} template.
// Replace |title= with |script-title=xx:
pattern = @"\{\{((?:ru|bs|sr|uk)) icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CYRILLIC_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$2script-title=$1:$3$4|language=$1$5");
Skip = false;
}
// RUSSIAN: Find citations where |title= is in Cyrillic and the citation contains |language=ru or |language=Russian.
// Replace |title= with |script-title=ru:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CYRILLIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Rr]ussian|[Rr]u))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=ru:$2$3$4");
Skip = false;
}
// BOSNIAN: Find citations where |title= is in Cyrillic and the citation contains |language=bs or |language=Bosnian.
// Replace |title= with |script-title=bs:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CYRILLIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Bb]osnian|[Bb]s))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=bs:$2$3$4");
Skip = false;
}
// SERBIAN: Find citations where |title= is in Cyrillic and the citation contains |language=sr or |language=Serbian.
// Replace |title= with |script-title=sr:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CYRILLIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Ss]erbian|[Ss]r))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=sr:$2$3$4");
Skip = false;
}
// UKRAINIAN: Find citations where |title= is in Cyrillic and the citation contains |language=uk or |language=Ukrainian.
// Replace |title= with |script-title=uk:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_CYRILLIC_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Uu]krainian|[Uu]k))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=uk:$2$3$4");
Skip = false;
}
// SINDHI: Find citations where |title= is in Arabic or Devanagari and the citation is followed by an {{sd icon}} template.
// Replace |title= with |script-title=sd:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_SINDHI_SCRIPT + @")([^\}]*)(\}\})\s*\{\{sd icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=sd:$2$3|language=sd$4");
Skip = false;
}
// SINDHI: Find citations where |title= is in Arabic or Devanagari and the citation is preceded by an {{sd icon}} template.
// Replace |title= with |script-title=sd:
pattern = @"\{\{sd icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_SINDHI_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=sd:$2$3|language=sd$4");
Skip = false;
}
// SINDHI: Find citations where |title= is in Arabic or Devanagari and the citation contains |language=sd or |language=Sindhi. Replace |title= with |script-title=sd:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_SINDHI_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Ss]indhi|[Ss]d))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=sd:$2$3$4");
Skip = false;
}
// THAI: Find citations where |title= is in Thai and the citation is followed by {{th icon}} template.
// Replace |title= with |script-title=th:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_THAI_SCRIPT + @")([^\}]*)(\}\})\s*\{\{th icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=th:$2$3|language=th$4");
Skip = false;
}
// THAI: Find citations where |title= is in Thai and the citation is preceded by {{th icon}} template.
// Replace |title= with |script-title=th:
pattern = @"\{\{th icon\}\}\s*(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_THAI_SCRIPT + @")([^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=th:$2$3|language=th$4");
Skip = false;
}
// THAI: Find citations where |title= is in Thai and the citation contains |language=th or |language=Thai.
// Replace |title= with |script-title=dv:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\|\s*)title\s*=\s*(" + IS_THAI_SCRIPT + @")([^\}]*\|\s*language\s*=\s*(?:[Tt]hai|[Tt]h))([^\}]*\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1script-title=th:$2$3$4");
Skip = false;
}
// OTHERS: Find {{xx icon}} templates that follow a CS1 citation template. Remove {{xx icon and add |language=xx
// __PROTECTED1__ citations were protected because of a mix of script and Latin so it is OK to move {{xx icon}} to |language=xx
pattern = @"(\{\{(?:__PROTECTED1__)?" + IS_CS1E + @"[^\}]*)(\}\})\s*\{\{([A-Za-z][A-Za-z]) icon\}\}";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1|language=$3$2");
Skip = false;
}
// OTHERS: Find {{xx icon}} templates that precede a CS1 citation template. Remove {{xx icon and add |language=xx
// __PROTECTED1__ citations were protected because of a mix of script and Latin so it is OK to move {{xx icon}} to |language=xx
pattern = @"\{\{([a-z]{2,2}) icon\}\}\s*(\{\{(?:__PROTECTED1__)?" + IS_CS1E + @"[^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$2|language=$1$3");
Skip = false;
}
//---------------------------< U N P R O T E C T >------------------------------------------------------------
// UNPROTECT: This is the last step of the conversion process. Once all of the other rules have run, if we protected any citations
// by adding __PROTECTED__ or __PROTECTED1__ to them, search for those strings and replace them with nothing.
ArticleText = Regex.Replace(ArticleText, @"__PROTECTED\d?__", "");
// ArticleText = Regex.Replace(ArticleText, @"__PROTECTED1?__", "");
//---------------------------< M U L T I - I C O N T O L A N G U A G E >----------------------------------
// In this section we attempt to place multiple (2–5) {{xx icon}} template language names into a comma separated value for |language=
// LANGUAGE PARAMETER: Protect cs1|2 templates that have a value assigned to |language=.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*)(" + IS_CS1E + @"[^}]*\|\s*language\s*=\s*[^\|\}]+)", "$1__PROTECTED__$2");
// INCLUDED TEMPLATES: Protect any citations that contain other templates. Matches any embedded template.
// By the time we get here, embedded {{xx icon}} templates that could be removed have been removed by the INSIDE ICONS rule.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*)(" + IS_CS1E + @"[^\{\}]*\{\{[^\}]*\}\})", "$1__PROTECTED__$2");
// five {{xx icon}} templates separated by ' and ', '&', '/' or space or nothing.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1E + @"[^\}]*)(\}\})\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}", "$1 |language=$3, $4, $5, $6, $7$2");
// four {{xx icon}} templates separated by ' and ', '&', '/' or space or nothing.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1E + @"[^\}]*)(\}\})\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}", "$1 |language=$3, $4, $5, $6$2");
// three {{xx icon}} templates separated by ' and ', '&', '/' or space or nothing.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1E + @"[^\}]*)(\}\})\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}", "$1 |language=$3, $4, $5$2");
// two {{xx icon}} templates separated by ' and ', '&', '/' or space or nothing.
ArticleText = Regex.Replace(ArticleText, @"(\{\{\s*" + IS_CS1E + @"[^\}]*)(\}\})\s*\{\{([a-z]{2})\s*icon\}\}\s*(?:and|&|/)?\s*\{\{([a-z]{2})\s*icon\}\}", "$1 |language=$3, $4$2");
// UNPROTECT: This is the last step of the multi-icon process
ArticleText = Regex.Replace(ArticleText, @"__PROTECTED__", "");
// CLEANUP: Find citations where Monkbot task 6 didn't properly ignore citations with embedded templates (pre-rev e)
// Replace |title= with |script-title=dv:
pattern = @"(\{\{" + IS_CS1 + @"[^\}]*\{\{[^\}]*)(\|\s*language\s*=[^\|\}]*)(\}\}[^\}]*)(\}\})";
if (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1$3$2$4");
Skip = false;
}
return ArticleText;
}