Tags (Unicode block)

{{Infobox Unicode block

|blockname = Tags

|rangestart = E0000

|rangeend = E007F

|script1 = Common

|deprecated = 1

|3_1 = 97

|note = {{cite web|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|accessdate=2023-07-26}}{{cite web|url=https://www.unicode.org/versions/enumeratedversions.html|title=Enumerated Versions of The Unicode Standard|work=The Unicode Standard|accessdate=2023-07-26}}

}}

Tags is a Unicode block containing formatting tag characters. The block is designed to mirror ASCII. It was originally intended for language tags, but has now been repurposed as emoji modifiers, specifically for region flags.

Legacy use

U+E0001, U+E0020–U+E007F were originally intended for invisibly tagging texts by language{{cite journal|url=http://tools.ietf.org/html/rfc2482|title=RFC2482: Language Tagging in Unicode Plain Text | publisher=Network Working Group|date=January 1999|doi=10.17487/RFC2482 |last1=Whistler |first1=K. |last2=Adams |first2=G. |url-access=subscription }} but that use is no longer recommended.{{cite journal|url=http://tools.ietf.org/html/rfc6082|title=RFC6082: Deprecating Unicode Language Tag Characters: RFC 2482 is Historic | publisher=Internet Engineering Task Force (IETF)|date=November 2010|doi=10.17487/RFC6082 |last1=Whistler |first1=K. |last2=Adams |first2=G. |last3=Duerst |first3=M. |last4=Klensin |first4=J. |last5=Klensin |first5=J. |editor-first1=R. |editor-last1=Presuhn |doi-access=free |url-access=subscription }}

All of those characters were deprecated in Unicode 5.1.

With the release of Unicode 8.0, U+E0020–U+E007E are no longer deprecated characters.

The change was made "to clear the way for the potential future use of tag characters for a purpose other than to represent language tags".{{cite web|url=https://unicode.org/versions/Unicode8.0.0/#Migration|title=Unicode 8.0.0, Implications for Migration | publisher=Unicode Consortium}}

Unicode states that "the use of tag characters to represent language tags in a plain text stream is still a deprecated mechanism for conveying language information about text".

Current use

With the release of Unicode 9.0, U+E007F is no longer a deprecated character. (U+E0001 LANGUAGE TAG remains deprecated.) The release of Emoji 5.0 in May 2017{{Cite web |title=Emoji Version 5.0 List|url=https://emojipedia.org/emoji-5.0/|publisher=Emojipedia| accessdate=24 July 2021 }} considers these characters to be emoji for use as modifiers in special sequences.

The only usage specified is for representing the flags of regions, alongside the use of Regional Indicator Symbols for national flags.{{Cite web|url=https://unicode.org/reports/tr51/| date=2017-05-18 |title=UTR #51: Unicode Emoji|publisher=Unicode Consortium}} These sequences consist of {{unichar|1F3F4|WAVING BLACK FLAG}} followed by a sequence of tags corresponding to the region as coded in the CLDR, then {{unichar|E007F|CANCEL TAG}}. For example, using the tags for "gbeng" (🏴󠁧󠁢󠁥󠁮󠁧󠁿) will cause some systems to display the flag of England, those for "gbsct" (🏴󠁧󠁢󠁳󠁣󠁴󠁿) the flag of Scotland, and those for "gbwls" (🏴󠁧󠁢󠁷󠁬󠁳󠁿) the flag of Wales.

The tag sequences are derived from ISO 3166-2, but sequences representing other subnational flags (for example US states) are also possible using this mechanism. However, as of Unicode version 12.0 only the three flag sequences listed above are "Recommended for General Interchange" by the Unicode Consortium, meaning they are "most likely to be widely supported across multiple platforms".{{Cite web | title=emoji-sequences.txt | url=https://unicode.org/Public/emoji/latest/emoji-sequences.txt| date=2023-06-05 | publisher=Unicode Consortium | accessdate=5 March 2019 }}

Tags have been used to create invisible prompt injections on LLMs.https://embracethered.com/blog/posts/2024/m365-copilot-prompt-injection-tool-invocation-and-data-exfil-using-ascii-smuggling/

Unicode block

{{Unicode chart Tags}}

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Tags block:

{{sticky header}}

class="wikitable collapsible sticky-header"
Version{{nobr|Final code points}}CountL2 IDWG2 IDDocument
rowspan="21" | 3.1rowspan="11" | U+E0001rowspan="11" | 1{{nobr|L2/97-203}}{{Citation|title=Plane 14 characters for generic tags|date=1997-08-05|first1=Ken|last1=Whistler|first2=Glenn|last2=Adams|ref=none}}
{{nobr|L2/97-171R2}}{{Citation|title=Plane 14 Characters for Generic Tags|date=1997-09-18|first=Ken|last=Whistler|ref=none}}
{{nobr|L2/97-256}}{{Citation|title=Comments on Plane 14 Position Paper|date=1997-10-20|first=Mati|last=Allouche|ref=none}}
{{nobr|[https://www.unicode.org/L2/L1997/97255r.pdf L2/97-255R]}}{{Citation|title=Approved Minutes – UTC #73 & L2 #170 joint meeting, Palo Alto, CA – August 4-5, 1997|date=1997-12-03|first=Joan|last=Aliprand|ref=none|section=3.B. Lightweight language tagging}}
{{nobr|L2/98-027}}[https://web.archive.org/web/20200215052615/http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1670w97.doc N1670]{{Citation|title=Plane 14 characters for language tags|date=1997-12-12|ref=none}}
{{nobr|[https://www.unicode.org/L2/L1998/98039.pdf L2/98-039]}}{{Citation|title=Preliminary Minutes - UTC #74 & L2 #171, Mountain View, CA - December 5, 1997|date=1998-02-24|first1=Joan|last1=Aliprand|first2=Arnold|last2=Winkler|ref=none|section=2.C REVISED PROPOSALS}}
{{nobr|L2/98-286}}[https://web.archive.org/web/20200215052615/http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1703w97.doc N1703]{{Citation|title=Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20|date=1998-07-02|first1=V. S.|last1=Umamaheswaran|first2=Mike|last2=Ksar|ref=none|section=7.4}}
{{nobr|[https://www.unicode.org/L2/L1998/98281r.pdf L2/98-281R (pdf],}} [https://www.unicode.org/L2/L1998/98281R.htm html]){{Citation|title=Unconfirmed Minutes – UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998|date=1998-07-31|first=Joan|last=Aliprand|ref=none|section=IETF and W3C Issues (VI)}}
{{nobr|[https://www.unicode.org/L2/L2000/00010-n2103.pdf L2/00-010]}}[https://www.unicode.org/wg2/docs/n2103.pdf N2103]{{Citation|title=Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13—16|date=2000-01-05|first=V. S.|last=Umamaheswaran|ref=none|section=9.1}}
{{nobr|[https://www.unicode.org/L2/L2001/01301-deprecation.txt L2/01-301]}}{{Citation|title=Analysis of Character Deprecation in the Unicode Standard|date=2001-08-01|first=Ken|last=Whistler|ref=none|section=Tag Characters}}
{{nobr|[https://www.unicode.org/consortium/utc-minutes/UTC-091-200205.html L2/02-166R2]}}{{Citation|title=UTC #91 Minutes|date=2002-08-09|first=Lisa|last=Moore|ref=none|section=Character Deprecation}}
rowspan="10" | U+E0020..E007Frowspan="10" | 96{{nobr|[https://www.unicode.org/L2/L2016/16042-emoji-clarifications.txt L2/16-042]}}{{Citation|title=Clarifications Requested for "Full Emoji Data" and Emoji Flags|date=2015-01-26|first1=Agustin|last1=Fonts|first2=Roozbeh|last2=Pournader|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2015/15145r-add-regional-ind.pdf L2/15-145R]}}{{Citation|title=Proposal for additional regional indicator symbols|date=2015-05-07|first=Peter|last=Edberg|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2015/15107.htm L2/15-107]}}{{Citation|title=UTC #143 Minutes|date=2015-05-12|first=Lisa|last=Moore|ref=none|section=E.1.3 Proposal for additional regional indicator symbols}}
{{nobr|[https://www.unicode.org/L2/L2015/15190-pri299-additional-flags-bkgnd.html L2/15-190]}}{{Citation|title=PRI #299 Background: Representing Additional Types of Flags|date=2015-06-29|first=Peter|last=Edberg|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2015/15206-region-subdiv.pdf L2/15-206]}}{{Citation|title=Region / Subdivision validity for flags|date=2015-07-25|first=Mark|last=Davis|author-link=Mark Davis (Unicode)|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2016/16180r-eng-scot-wales-flags.pdf L2/16-180R]}}{{Citation|title=Proposal to include Emoji Flags for England, Scotland and Wales|date=2016-07-07|first1=Jeremy|last1=Burge|author-link1=Jeremy Burge|first2=Owen|last2=Williams|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2017/17016.htm L2/17-016]}}{{Citation|title=UTC #150 Minutes|date=2017-02-08|first=Lisa|last=Moore|ref=none|section=Action item 150-A59|quote=Add the three sequences for flags documented in L2/16-180R to emoji-sequences.txt for emoji 5.0.}}
{{nobr|[https://www.unicode.org/L2/L2017/17048-pri343-fdbk.pdf L2/17-048]}}{{Citation|title=Feedback on PRI 343 (Unicode Emoji 5.0)|date=2017-01-24|first=Roozbeh|last=Pournader|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2017/17086-vs16-keycaps-emoji.pdf L2/17-086]}}{{Citation|title=Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component|date=2017-03-27|first1=Jeremy|last1=Burge|display-authors=etal|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2017/17103.htm L2/17-103]}}{{Citation|title=UTC #151 Minutes|date=2017-05-18|first=Lisa|last=Moore|ref=none|section=E.1.7 Add ZWJ, VS-16, Keycaps & Tags to Emoji_Component}}
class="sortbottom"

| colspan="6" | {{reflist|group=lower-alpha|refs=Proposed code points and characters names may differ from final code points and names}}

References