CJK Unified Ideographs Extension I

{{for|a list of all CJK characters encoded in Unicode|CJK Unified Ideographs}}

{{Infobox Unicode block

|blockname = CJK Unified Ideographs Extension I

|rangestart = 2EBF0

|rangeend = 2EE5F

|script1 = Han

|15_1 = 622

|note = {{Cite web|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|access-date=2023-09-12}}{{Cite web|url=https://www.unicode.org/versions/enumeratedversions.html|title=Enumerated Versions of The Unicode Standard|work=The Unicode Standard|access-date=2023-09-12}}

}}

__FORCETOC__

CJK Unified Ideographs Extension I is a Unicode block comprising CJK Unified Ideographs included in drafts of an amendment to China's GB 18030 standard circulated in 2022 and 2023, which were fast-tracked into Unicode in 2023.

Background

{{further|GB 18030}}

Unlike most other sets of CJK unified ideographs, Extension I was not prepared and submitted by the Ideographic Research Group (IRG).{{cite web |url=https://www.unicode.org/L2/L2023/23250-irgn2620-recs.pdf#page=4 |title=Recommendation IRG M61.12: Issue of Extension I to Other CJK Source Characters (IRGN2635 & Feedback, IRGN2622) |work=IRG Meeting #61 Recommendations and Action Items |id=ISO/IEC JTC1/SC2 N4885, WG2 N5243, IRG N2620; UTC L2/23-250 |date=2023-10-20 |author=Ideographic Research Group}}

GB 18030 is a mandatory national standard of the People's Republic of China (PRC). It defines a Unicode Transformation Format which retains compatibility with existing data in the earlier GBK and EUC-CN character encodings, and specifies particular Unicode characters which devices sold in China must support.{{cite web |url=https://archives.miloush.net/michkap/archive/2013/03/28/10405914.html |title=You call it GB18030, I call it UTF-GBK... |last=Kaplan |first=Michael S |date=2013-03-28 |work=Sorting it all out}} Its 2022 edition, {{nobr|GB 18030-2022}}, changed a number of required characters to map to standard Unicode code points, rather than to private use area code points.

In late 2022, the PRC made a draft of a further amendment to be made to GB 18030 available for public consultation. This draft would have placed 897 new sinographic characters in Plane 10 (hexadecimal: 0A), a yet-untitled astral Unicode plane. This was motivated by a "strong need of citizen real-name certification in China".{{cite web |url=https://www.unicode.org/L2/L2023/23240-irgn2623-china-ar.pdf |id=ISO/IEC JTC1/SC2/WG2/IRG N2623; UTC L2/23-240 |title=IRG #61 Activity Report |author=China National Body |date=2023-10-13}} Since it would impact ISO/IEC 10646 (the Universal Coded Character Set, the ISO standard synchronised with Unicode), the draft was circulated in ISO/IEC JTC 1/SC 2, the ISO subcommittee responsible for ISO 10646. The Chinese national body maintained that "ISO/IEC 10646 do not specify the purpose of the 0A plane", which ISO 10646 denotes as "reserved for future standardization", and that this use was therefore "not inappropriate".

However, since the intent of ISO 10646 was for Plane 10 to be reserved for future allocation by ISO 10646 and Unicode via their usual ballot process, not for it to be allocated unilaterally by national standards bodies, this proposed move was criticised by experts and other national bodies as one which would "destabilize the synchronization" between GB 18030 and ISO/IEC 10646 (and thus Unicode), and which would make it impossible to conform to both with a single implementation, effectively forking Unicode. At its meeting in March 2023, the IRG emphasised the importance of providing any subsequent GB 18030 amendment drafts to IRG experts in a timely manner, and of not "using the ISO/IEC 10646 standard inappropriately".{{cite web |url=https://www.unicode.org/L2/L2023/23087-irgn2600-irg60-recs.pdf#page=3 |title=Recommendation IRG M60.7: Draft GB18030-2022 Amendment Feedback (IRGN2591, IRGN2605) |work=IRG Meeting #60 Recommendations and Action Items |id=ISO/IEC JTC1/SC2 N4840, WG2 N5205, IRG N2600; UTC L2/23-087 |date=2023-03-24 |author=Ideographic Research Group}}

As an alternative, the repertoire (eventually reduced to 622 characters after expert review) was fast-tracked into Unicode version 15.1 in September 2023, as the CJK Unified Ideographs Extension I block.{{Cite web | url=https://www.unicode.org/wg2/docs/n5222_USNB-Comments-on-Draft-2-of-GB%2018030-2022-Amendment-1.pdf | title=USNB Comments on Draft 2 of GB 18030-2022 Amendment 1 and recommendation for ISO/IEC 10646:2020 Amendment 2 | date=May 1, 2023 |author=United States National Body |id=ISO/IEC JTC1/SC2 N4852, WG2 N5222; UTC L2/23-115}} The characters constitute the "GIDC23" Unihan source,{{cite web |url=https://www.unicode.org/charts/PDF/U2EBF0.pdf |title=CJK Unified Ideographs Extension I |work=The Unicode Standard, Version 15.1 |publisher=Unicode Consortium |date=2023}} defined as sourced from the "ID system of the Ministry of Public Security of China, 2023".{{cite web |url=https://www.unicode.org/reports/tr38/#kIRG_GSource |date=2023-09-01 |title=kIRG_GSource |work=Unicode Han Database (Unihan) |id=UAX #38 |version=Unicode 15.1.0 |editor-last1=Lunde |editor-first1=Ken |editor-link1=Ken Lunde |editor-last2=Cook |editor-first2=Richard}} The CJK Unified Ideographs Extension D block was cited as a precedent, since it comprised a repertoire of urgently needed characters (UNCs) from IRG member bodies, whereas the IRG working-set initially slated to become Extension D would instead become Extension E.{{cite web |url=https://www.unicode.org/L2/L2023/23082-cjk-unihan-group-utc175.pdf#page=3 |title=03) L2/23-100: GB 18030-2022 Amendment, Draft 2 + Disposition of Comments, Draft 1 |work=CJK & Unihan Group Recommendations for UTC #175 Meeting |last=Lunde |first=Ken |author-link=Ken Lunde |date=2023-04-22 |id=UTC L2/23-082}} For compactness, the block was allocated to the available space in the Supplementary Ideographic Plane after CJK Unified Ideographs Extension F, as opposed to on the Tertiary Ideographic Plane after CJK Unified Ideographs Extension H; this means that the CJK extension blocks are no longer in alphabetical order by extension letter.{{citation|mode=cs1 |quotation=To keep the CJK block ranges as compact as possible, Extension I has been added to Plane 2, instead of directly after Extension H on Plane 3. Implementers should also check that their code does not assume that CJK extensions all occur in alphabetic order by the extension letter. |section=CJK/Unihan Changes |title=Unicode 15.1.0 |publisher=Unicode Consortium |date=2023-09-12 |url=https://www.unicode.org/versions/Unicode15.1.0/}} Following this, the draft GB 18030 amendment was modified to use the Extension I code points.

At its next meeting in October 2023, the IRG expressed concerns about bypassing the IRG for large collections of CJK characters, and noted that two of the characters in Extension I had, for the purposes of other regions' character sources, previously been unified with existing characters under IRG unification rules:{{cite web |url=https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg61/IRGN2635_HorizontalExtension.pdf#page=3 |title=2. Newly introduced half-duplicated characters |date=2023-05-17 |work=Application for Horizontal Extensions of Multiple Sources in CJK-ExtI |last=Sim |first=Cheon-hyeong |id=ISO/IEC JTC1/SC2/WG2/IRG N2635 |pages=3–5}} (Note: the referenced document refers to an earlier draft of Extension I with code points that differ from those in the final version accepted into Unicode. U+2ED90 in the referenced document corresponds to {{unichar|2ED9D}} in the final version, while U+2EDD1 in the referenced document corresponds to {{unichar|2EDE0}} in the final version.)

  • Allowing for interchangeable forms of the grass radical, {{unichar|2ED9D}} corresponds to the pre-existing T-source (Taiwan) glyph for {{unichar|8286}} (referenced from CNS 11643),{{cite web |url=https://unicode.org/charts/PDF/Unicode-15.0/U150-4E00.pdf#page=342 |title=CJK Unified Ideographs |work=The Unicode Standard, Version 15.0 |institution=Unicode Consortium |page=823}} as well as to a proposed J-source (Japan) glyph for the same.{{citation|mode=cs1 |section-url=https://www.unicode.org/wg2/docs/n5221-Proposed%20Horizontal%20Extension.pdf#page=414 |section=WG2 n5221 data file: Proposed Horizontal Extension |date=2023-04-24 |url=https://www.unicode.org/L2/L2023/23144-n5221-japan.pdf |title=Request for Horizontal Extension in the J-column of ISO/IEC 10646 |author=Japan National Body |id=ISO/IEC JTC1/SC2/WG2 N5221; UTC L2/23-144 |page=414}} A character corresponding to the other (G-source, i.e. Mainland China) glyph of U+8286 does exist elsewhere in more recent editions of CNS 11643, so the addition of U+2ED9D impacts the existing correspondences between CNS 11643 and Unicode although, due to neither character being in planes 1 or 2, there are no implications for the Unicode mapping of Big5.
  • {{unichar|2EDE0}} corresponds to a proposed J-source (Japan) glyph for {{unichar|8FF3}}.{{citation|mode=cs1 |section-url=https://www.unicode.org/wg2/docs/n5221-Proposed%20Horizontal%20Extension.pdf#page=458 |section=WG2 n5221 data file: Proposed Horizontal Extension |date=2023-04-24 |url=https://www.unicode.org/L2/L2023/23144-n5221-japan.pdf |title=Request for Horizontal Extension in the J-column of ISO/IEC 10646 |author=Japan National Body |id=ISO/IEC JTC1/SC2/WG2 N5221; UTC L2/23-144 |page=458}} It had previously been proposed as a new character twice (once with reference to CNS 11643, and once by Japan), but rejected on the basis that it was unifiable with U+8FF3. The proposed glyph was later moved to the new {{unichar|2EDE0}} code point, per a request by the Japanese national body.{{cite web |url=https://www.unicode.org/L2/L2024/24016-n5245-pdam20-3-doc.pdf#page=5 |title=Disposition of comments on CDAM2.3 to ISO/IEC 10646 6th edition |date=2024-01-03 |editor-first=Michel |editor-last=Suignard |id=ISO/IEC JTC1/SC2/WG2 N5245, UTC L2/24-016}}

In response, the IRG recommended that, in future, submitters of proposed CJK characters be required to provide information about the impact on other CJK character sources of any disunifications proposed by the submission, and that the IRG be given time to review all large submissions of CJK characters. The IRG encouraged the Chinese body to propose solutions to the issues caused by the addition of these two characters at the next IRG meeting.

Block

{{Unicode chart CJK Unified Ideographs Extension I}}

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Unified Ideographs Extension I block:

{{sticky header}}

class="wikitable collapsible sticky-header"
Version{{nobr|Final code points}}CountL2 IDWG2 IDIRG IDDocument
rowspan="11" | 15.1rowspan="11" | U+2EBF0..2EE5Drowspan="11" | 622{{nobr|[https://www.unicode.org/L2/L2023/23011-cjk-unihan-group-utc174.pdf L2/23-011]}}{{Citation|title=CJK & Unihan Group Recommendations for UTC #174 Meeting|date=2023-01-11|first=Ken|last=Lunde|author-link=Ken Lunde|ref=none|section=18) GB 18030-2022 Amendment}}
{{nobr|[https://www.unicode.org/L2/L2023/23057-gb18030-amendment-feedback.pdf L2/23-057]}}[https://www.unicode.org/wg2/docs/n5201-23057-gb18030-amendment-feedback.pdf N5201]N2591{{Citation|title=Draft GB 18030-2022 Amendment Feedback & Recommendations|date=2023-02-03|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2023/23100-gb18030-2022-amd-draft2.pdf L2/23-100]}}{{Citation|title=GB 18030-2022 Amendment, Draft 2 + Disposition of Comments, Draft 1|date=2023-04-10|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2023/23082-cjk-unihan-group-utc175.pdf L2/23-082]}}{{Citation|title=CJK & Unihan Group Recommendations for UTC #175 Meeting|date=2023-04-22|first=Ken|last=Lunde|ref=none|section=02 and 03}}
{{nobr|[https://www.unicode.org/L2/L2023/23106-unc-extension-i.pdf L2/23-106]}}[https://www.unicode.org/wg2/docs/n5214-23106-unc-extension-i.pdf N5214]{{Citation|title=Proposal to provisionally assign or accept 603 urgently-needed ideographs|date=2023-04-24|first=Ken|last=Lunde|ref=none|section=The Alternate Proposal—Unicode Version 15.1}}
{{nobr|[https://www.unicode.org/L2/L2023/23076.htm L2/23-076]}}{{Citation|title=UTC #175 Minutes|date=2023-05-01|first=Peter|last=Constable|ref=none|section=E.4.2 Proposal to provisionally assign or accept 603 urgently-needed ideographs}}
{{nobr|[https://www.unicode.org/L2/L2023/23114r-unc-extension-i.pdf L2/23-114R]}}[https://www.unicode.org/wg2/docs/n5214R2-23114-cjk-unc-extension.pdf N5214R2]{{Citation|title=Proposal to encode 622 urgently needed ideographs in UCS|date=2023-07-05|first=Ken|last=Lunde|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2023/23115-gb18030-2022-cmts.pdf L2/23-115]}}{{Citation|title=USNB Comments on Draft 2 of GB 18030-2020 Amendment 1 and recommendation for ISO/IEC 10646:2022 Amendment 2|date=2023-05-01|first=Peter|last=Constable|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2023/23154-n5238-unc-china.pdf L2/23-154]}}[https://www.unicode.org/wg2/docs/n5238-Revision%20of%20622%20UNCs%20of%20China.pdf N5238]{{Citation|title=Revision of 622 UNCs of China (Feedback on WG2 N5214)|date=2023-06-30|ref=none}}
{{nobr|[https://www.unicode.org/L2/L2023/23163-cjk-unihan-group-utc176.pdf L2/23-163]}}{{Citation|title=CJK & Unihan Group Recommendations for UTC #176 Meeting|date=2023-07-11|first=Ken|last=Lunde|ref=none|section=01}}
{{nobr|[https://www.unicode.org/L2/L2023/23157.htm L2/23-157]}}{{Citation|title=UTC #176 Minutes|date=2023-07-31|first=Peter|last=Constable|ref=none|section=E.1 Section 1 and E.1 Section 9 [Affects U+2EDE3]}}
class="sortbottom"

| colspan="7" | {{reflist|group=lower-alpha|refs=Proposed code points and characters names may differ from final code points and names}}

References

{{reflist}}

Further reading

  • {{Cite web|url=https://ken-lunde.medium.com/the-first-amendment-fe064d9d7d8|first=Ken|last=Lunde|author-link=Ken Lunde|title=The First Amendment|date=2023-07-15}} This article details how the CJK Unified Ideographs Extension I block became standardized, and its relationship with two drafts of the GB 18030-2022 amendment.

{{Unicode CJK Unified Ideographs}}

Category:Unicode blocks