Mojikyō

{{short description|Character encoding scheme}}

{{Infobox software

| name = {{Transliteration|ja|Mojikyō}}
{{small|{{Transliteration|ja|Konjaku Mojikyō}}}}
{{small|{{lang|ja|今昔文字鏡}}}}

| logo = 今昔文字鏡.gif

| screenshot = 文字鏡:台湾語仮名.png

| caption = The {{Transliteration|ja|Mojikyō}} character map highlighting the Taiwanese kana {{lang|ja|セ}}As yet, lacks a Unicode encoding, so is approximated here with CSS and {{unichar|30BB|KATAKANA LETTER SE}}.

| author =

| developer = Tadahisa Ishikawa
({{lang|ja|石川忠久}})
Tokio Furuya
({{lang|ja|古家時雄}})
Mojikyō Institute
({{lang|ja|文字鏡研究会}})

| released = 1.0 / {{Start date and age|df=yes|1997|07}}

| discontinued = y

| latest release version = 4.0

| latest release date = {{Start date and age|df=yes|2018|12|15}}

| operating system = Microsoft Windows

| size = 51MB

| language = Japanese

| genre = Character set bundled with fonts and a character map

| license = Proprietary

| website = {{URL|mojikyo.org}}

}}

{{italic title}}{{use dmy dates|date=July 2020}}{{Transliteration|ja|Hepburn|Mojikyō}} ({{langx|ja|文字鏡}}), also known by its full name {{Nihongo3|{{literally|(the) past and present character mirror}}|今昔文字鏡|Konjaku Mojikyō}}, is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The {{Nihongo|Mojikyō Institute|文字鏡研究会|Mojikyō Kenkyūkai}}, which published the character set, also published computer software and TrueType computer fonts to accompany it. The Mojikyō Institute, chaired by {{Nihongo|Tadahisa Ishikawa|石川忠久}},{{Cite web|title=今昔文字鏡について|trans-title=About Mojikyō|url=http://www.mojikyo.com/about.html|url-status=dead|archive-url=https://web.archive.org/web/20010203164100/http://www.mojikyo.com/about.html|archive-date=2001-02-03|access-date=2020-07-06|website=Mojikyō Institute|language=ja}} originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.{{cite web|script-title=ja:ようこそ、今昔文字鏡の世界へ!|trans-title=Welcome to the world of {{Transliteration|ja|Hepburn|Mojikyō}}!|url=http://mojikyo.com|language=ja|url-status=dead|archive-date=2005-03-04|archive-url=https://web.archive.org/web/20050304082244/http://www.mojikyo.com/|publisher=Kinokuniya KK|access-date=2020-07-05}}

Conceptualized in 1996,{{Cite web|last=Ishikawa|first=Tadahisa|date=August 2015|title=古家時雄君を悼む|trans-title=Tokio Furuya, we grieve your death|url=http://mojikyo.org/#MemorialWriting|access-date=2020-07-08|website=Mojikyō Institute|language=ja}} the first version of the CD-ROM was released in July 1997.{{Citation|title=Konjaku Mojikyō|date=July 1997|url=https://tsutaya.tsite.jp/item/book/PTA0000H5P25|language=ja|isbn=9784314900034|script-title=ja:今昔文字鏡}} For a time, the Mojikyō Institute also offered a web subscription, termed "{{Transliteration|ja|Hepburn|Mojikyō}} WEB" ({{Lang|ja|文字鏡WEB}}), which had more up-to-date characters.

{{As of|2006|9}}, Mojikyō encoded 174,975 characters.{{cite web|archive-url=https://web.archive.org/web/20050205150655/http://mojikyo.com/info/about/index.html|url=http://mojikyo.com/info/about/index.html|access-date=2020-07-05|archive-date=2005-02-05|publisher=Kinokuniya KK|language=ja|script-title=ja:今昔文字鏡とは|trans-title=What is {{Transliteration|ja|Hepburn|Mojikyō}}?}} Among those, 150,366 characters (≈86%) then belonged to the extended Chinese–Japanese–Korean–Vietnamese (CJKV)For Korean, Hanja are referred to. For Vietnamese, Chữ Nôm. family.{{cite web |url=http://www.mojikyo.com/info/about/index.html |script-title=ja:今昔文字鏡とは |language=ja|trans-title=About Mojikyo |publisher=Kinokuniya KK |archive-url=https://web.archive.org/web/20100427030433/http://www.mojikyo.com/info/about/index.html |archive-date=2010-04-27 |url-status=dead |access-date=2020-07-05}} Many of Mojikyō's characters are considered obsolete or obscure, and are not encoded by any other character set, including the most widely used international text encoding standard, Unicode.

Originally a paid proprietary software product, as of 2015, the Mojikyō Institute began to upload its latest releases to Internet Archive as freeware,{{Cite web|title=Search: creator:"MOJIKYO Institute"|url=https://archive.org/search.php?query=creator:%22MOJIKYO+Institute%22|access-date=2020-07-06|website=Internet Archive|language=en}} as a memorial to honor one of its developers, {{Nihongo|Tokio Furuya|古家時雄}}, who died that year. On 15 December 2018, version 4.0 was released. The next day, Ishikawa announced that without Furuya this would be the final release of Mojikyō.

Premise

The {{Transliteration|ja|Hepburn|Mojikyō}} encoding was created to provide a complete index of characters used in the Chinese, Japanese, Korean writing systems and Vietnamese Chữ Nôm logographic scripts. It also encodes a large number of characters in ancient scripts, such as the oracle bone script, the seal script, and Sanskrit (Siddhaṃ). For many characters, it is the only character encoding to encode them, and its data is often used as a starting point for Unicode proposals.{{cite book|title=Proposal for hentaigana|publisher=Information Processing Society of Japan|via=Unicode Consortium|id=L2/15-239|last1=Takada|first1=Tomokazu|last2=Yada|first2=Tsutomu|last3=Saito|first3=Tatsuya|translator-last=Kobayashi|translator-first=Tatsuo|translator-link=Tatsuo Kobayashi|translator-last2=Kobayashi|translator-first2=Daniel|url=https://www.unicode.org/L2/L2015/15239-hentaigana.pdf|date=2015-09-18|access-date=2020-07-05}}{{Cite book|title=Ideograph Variation Selector and Variation Collection Identifier|id=L2/03-413|publisher=Open Internationalization Initiative|last1=Hiura|first1=Hideki|last2=Kobayashi|first2=Tatsuo|author2-link=Tatsuo Kobayashi|last3=Kida|first3=Yasuo|display-authors=2|via=Unicode Consortium|access-date=2020-07-05|date=2003-10-31|url=https://www.unicode.org/L2/L2003/03413-varsel.html}} However, {{Transliteration|ja|Hepburn|Mojikyō}} has much looser standards than Unicode for encoding, which leads {{Transliteration|ja|Hepburn|Mojikyō}} to have many encoded glyphs of dubious, or even unintentionally fictional, origin.{{Cite web|last1=Takada|first1=Tomokazu [{{lang|ja|高田智和}}]|last2=Oda|first2=Tetsuji [{{lang|ja|織田哲治}}]|last3=Konishi|first3=Satoshi|display-authors=2|date=2013-08-26|script-title=ja:平成25年度第3回文字情報検討サブワーキンググループ議事録|trans-title=Meeting Minutes of the Third Character Information Examination Sub-Working Group of 2013 (Heisei 25)|url=https://mojikiban.ipa.go.jp/contents/pdf/2013/20130826_3k_g.pdf|access-date=2020-07-06|website=Information Technology Promotion Agency, Government of Japan|page=2|language=ja|quote=文字鏡研究会の関係者にヒアリングしたところ、オランダから提案されたWG2 N36981には文字鏡のフォントが使用されているが、文字鏡研究会は関与しておらず、提案内容についても疑問があるとのことであった。[According to an interview with a representative of the Mojikyō Institute, a Mojikyō font is used in WG2 N36981 proposed by the Netherlands, but the Mojikyō Institute itself is not involved with the proposal; it furthermore has doubts about some of the content of that proposal.]}}{{Cite journal|last=Suzuki|first=Toshiya [{{lang|ja|鈴木俊哉}}]|date=2009-07-30|script-title=ja:統合漢字に申請された「殷周金文集成引得」図形文字の調査|trans-title=Investigation on Glyphs collected from "Index to Collection of Inscriptions of the Yin-Zhou Period" to submit to CJK Unified Ideographs|url=https://web.archive.org/web/20200320094733if_/https://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=62547&item_no=1&attribute_id=1&file_no=1|journal=IPSJ SIG Technical Report|language=ja|publisher=Information Processing Society of Japan|volume=2009-DD-72|issue=7|pages=2|quote=しかし、拡張Cの標準化作業が8年の長期にわたり、また事後的に用例が必須とされたため、正式に公布された拡張C漢字の典拠は当初の典拠とはかなり異なるものとなっている。たとえば日本では当初は文字鏡研究会によって選定された1000文字程度の漢字を申請していた[。] [...] 典拠用例確認は文字鏡とは独立に行なわれたため、字形が文字鏡漢字から変更されたものも多い。[As the standardization effort for CJK Unified Ideographs Extension C has been eight long years in the making and examples of kanji have been requested after their encoding, the officially promulgated Extension C kanji standard is quite different from the original standard. For example, we, the Government of Japan, initially applied for about 1,000 kanji selected by the Mojikyō Institute[.] [...] Since the verification of the kanji was performed independently of the Mojikyō Institute, the character shapes were often changed from Mojikyō's version of that same codepoint.]|via=Internet Archive}} As such, while many non-Unicode {{Transliteration|ja|Hepburn|Mojikyō}} characters are suitable for addition to Unicode, not all can become Unicode characters, due to the differing standards of evidence required by each.

Composition

The {{Transliteration|ja|Hepburn|Mojikyō}} fonts ({{Lang|ja|文字鏡フォント}}) are TrueType fonts that come in a ZIP file and are each around 2{{en dash}}5 megabytes; the different fonts contain different numbers of characters.Download the file {{Mono|MojikyoCmap400ALL49TTF.7z}} from [http://mojikyo.org the official website] Also included is a Windows executable that implements a graphical character map, the "{{Transliteration|ja|Hepburn|Mojikyō}} Character Map" ({{Lang|ja|文字鏡MAP}}), {{mono|MOCHRMAP.EXE}}.English name from the title of the window produced by running the executable; Japanese name from the icon of the executable.Also called the "Mojikyō Cmap". {{mono|MOCHRMAP.EXE}} allows users to browse through the {{Transliteration|ja|Hepburn|Mojikyō}} fonts, and copy and paste characters in lieu of typing them on the keyboard. As opposed to the regular Windows character map, or for that matter KCharSelect, which both support TrueType fonts, {{mono|MOCHRMAP.EXE}} displays the numbered #Encoding slot of the requested character.{{Cite web|last=Ishikawa|first=Tadahisa|date=1999-05-25|title=パソコン悠悠漢字術 今昔文字鏡徹底活用|trans-title=Kanji on your PC, Made Easy{{em dash}}The Complete Mojikyō Manual|url=https://www.est.co.jp/ks/dish/jepax/samples/ykanji/ykanji-sample.html|access-date=2020-07-06|publisher=Mojikyō Institute}}See the screenshots on [http://mojikyo.org the official website] In order for {{mono|MOCHRMAP.EXE}} to work, all {{Transliteration|ja|Hepburn|Mojikyō}} fonts must be installed.Into the system fonts directory {{Mono|C:\Windows\Fonts}}.

Encoding

When referring to a character encoded in {{Transliteration|ja|Hepburn|Mojikyō}}, the format MXXXXXX is often used,{{cite web |url=https://www.unicode.org/L2/L2018/18193-shuishu-n4956.pdf#page=21 |title=Table 5: Comparative Table of Shuishu Characters from All Sources |work=Analysis of Shuishu character repertoire |id=ISO/IEC JTC1/SC2/WG2 N4956; UTC L2/18-193 |pages=21–212 |last1=West |first1=Andrew |author-link1=Andrew West (linguist) |last2=Chan |first2=Eiso |date=2018-06-01}} similar to the U+XXXX format used for Unicode. A difference, however, is that {{Transliteration|ja|Hepburn|Mojikyō}} encodings displayed this way are decimal, while Unicode's U+ encoding is hexadecimal.

From the earliest days of Unicode, {{Transliteration|ja|Hepburn|Mojikyō}} has both influenced—and been influenced by—the standard. Glyphs originating from {{Transliteration|ja|Hepburn|Mojikyō}} first appear in a proposal to the Ideographic Rapporteur Group (IRG),As of 2019, the IRG rebranded as the Ideographic Research Group. which is responsible for maintaining all CJK blocks in Unicode,{{Cite web|title=Unicode Standard Annex #45: U-source Ideographs|url=https://www.unicode.org/reports/tr45/|work=The Unicode Standard|publisher=Unicode Consortium}}{{Cite web|date=March 2020|title=Appendix E: Han Unification History|url=https://www.unicode.org/versions/Unicode13.0.0/appE.pdf|work=The Unicode Standard|publisher=Unicode Consortium}} on 18 April 2002.{{Cite web|title=CJK Extension C1 From Japan|url=https://appsrv.cse.cuhk.edu.hk/~irg/n0801-0900.html|publisher=Ideographic Rapporteur Group|via=The Chinese University of Hong Kong's Department of Computer Science and Engineering|id=IRG#19 N895|quote={{mono|N895-Japan_C1}}}} In May 2007, {{Transliteration|ja|Hepburn|Mojikyō}} played a minor role in an eventually successful series of proposals to encode the Tangut script in Unicode;{{Cite book|last=Cook|first=Richard|url=http://unicode.org/wg2/docs/n3297.pdf|title=Proposal to encode Tangut characters in UCS Plane 1|date=2007-05-09|publisher=UC Berkeley Script Encoding Initiative|pages=4|id=L2/07-143|via=Unicode Consortium}}The history of the encoding of the Tangut script is quite complicated, see {{Format linkr|Tangut_(Unicode_block)#History}} for a full listing of all the related proposals and a timeline. {{Transliteration|ja|Hepburn|Mojikyō}} already had within its encoding 6,000 Tangut characters by October 2002.

The Unicode Standard's Unihan Database refers to {{Transliteration|ja|Hepburn|Mojikyō}} as the "Japanese KOKUJI Collection" ({{Lang|ja|日本国字集}}),{{Citation |title=Unicode Standard Annex #38 |date=2020-03-05 |url=https://www.unicode.org/reports/tr38/index.html |editor-last=Jenkins |editor-first=John H. |chapter=kIRG JSource |chapter-url=https://www.unicode.org/reports/tr38/index.html#kIRG_JSource |publisher=Unicode Consortium |ref={{harvid|UAX38}} |editor2-last=Cook |editor2-first=Richard |editor3-last=Lunde |editor3-first=Ken}} abbreviated "JK".{{Cite web |last=Kobayashi |first=Tatsuo |author-link=Tatsuo Kobayashi |date=2001-12-03 |title=List of Japanese Ideographs which may be proposed in Extension-C |url=https://appsrv.cse.cuhk.edu.hk/~irg/irg/irg18/IRG18.htm |id=ISO/IEC JTC1/SC2/WG2/IRG N853}} For example, {{Unichar|2b679}},Ideographic Description Sequence: {{Lang|zh|⿰魚嵐}} an ideograph read in Japanese as {{Nihongo3|{{literally|blizzard}}|ブリザード|burizādo}}, has a J-SourceThis is a column name in the Unihan database; ⟨J⟩ here is short for "Japanese glyph source". The full name of the column is {{Code|kIRG_JSource}}. Under Han unification, there are nine such sources. See §3.1 of #{{harvid for a complete list and more information. equal to JK-66038. All Unicode characters with a JK-prefixed J-Source originate from {{Transliteration|ja|Hepburn|Mojikyō}}.{{Cite twitter|number=1280291832593670145|user=ken_lunde|title=JK-prefixed J-Source ideographs came from 今昔文字鏡, which are in Extensions C and E (the mention of Extension D was simply that what became Extension E was originally targeted to become Extension D).|author=Ken Lunde|date=2020-07-06|access-date=2020-07-06|author-link=Ken Lunde|url-status=live|archive-url=https://web.archive.org/web/20200707001005/https://twitter.com/ken_lunde/status/1280291832593670145|archive-date=7 July 2020}}Other J-Source prefixes exist, such as J4, meaning the character originates from JIS X 0213:2004. According to Ken Lunde, a subject matter expert in character encodings and East Asian languages, as of Unicode 13.0, 782 ideographs in Unicode originate from {{Transliteration|ja|Hepburn|Mojikyō}}, split somewhat evenly between two blocks: CJK Unified Ideographs Extension C, with 367, and CJK Unified Ideographs Extension E, with 415.{{Cite twitter|number=1280285758385827842|user=ken_lunde|title=In particular, all 782 JK-prefixed ideographs are indeed from 今昔文字鏡 per IRG N862. Most were encoded in #ExtensionC, and the stragglers were encoded in #ExtensionE..|author=Ken Lunde|date=2020-07-06|author-link=Ken Lunde|access-date=2020-07-06}}{{Cite twitter|number=1280314296799424519|user=ken_lunde|title=367 JK-prefixed ideographs are in Extension C, and the remaining 415 are in Extension E..|author=Ken Lunde|date=2020-07-06|author-link=Ken Lunde|access-date=2020-07-06}} Not all Unicode characters with {{Transliteration|ja|Hepburn|Mojikyō}} origins (JK-prefixed J-Sources) have the same representative glyph in the code chart as in the {{Transliteration|ja|Hepburn|Mojikyō}} font;That is to say, a glyph made up of the same radicals in the same positions. some characters had their shapes changed before final encoding, as investigation showed the shapes assigned by the Mojikyō Institute were wrong.Errors in large collections of ideographs are, of course, not uncommon. Such errors even accidentally occur in well funded government-produced collections, such as the famous kanji from unknown sources in the Japanese Industrial Standards Committee's JIS X 0208 double-byte character encoding standard. All of these JIS X 0208 error kanji (Ghost characters, {{lang|ja|幽霊文字}}; e.g., {{lang|ja|}}) have made their way into Unicode despite not being "real" kanji.

= Blocks =

{{As of|2006|9}} it encoded 174,975 characters. Among those, 150,366 characters then belonged to the extended CJKV family. Many of the encoded characters are considered obsolete or otherwise obscure, and are not encoded by any other character set, including the international standard, Unicode. Each {{Transliteration|ja|Hepburn|Mojikyō}} character has a unique number, and the characters are organized into blocks.

{{Transliteration|ja|Hepburn|Mojikyō}} puts CJKV characters in different blocks according to their traditional Kangxi radical. Common radicals containing an especially high number of characters, such as Radicals 9 ({{Lang|ja|人}}) and 162 ({{Lang|ja|⻌}}), are split further by stroke order.For proof, see the list in the Mojikyō Character Map, {{Mono|MOCHRMAP.EXE}}.

= No unification =

Unlike Unicode, {{Transliteration|ja|Hepburn|Mojikyō}} purposely avoids Han unification; no attempt at compactness of the encoding is made, nor is there an attempt to keep all common characters below U+FFFF as there is in Unicode.{{citation needed|date=October 2024}}

Unicode, on the other hand, sorts its CJK into blocks based on how common they are: the most common are generally put into the Basic Multilingual Plane, while those that are rare or obscure are put into the Supplementary Planes.{{citation needed|date=October 2024}}

License

{{Transliteration|ja|Hepburn|Mojikyō}} is proprietary software under a restrictive license. Originally, the Mojikyō Institute tried to prevent its character data from being used, and threatened those who published conversion tables to and from its character set. In July 2010, the Mojikyō Institute abandoned its legal efforts to stop at least one Japanese user from publishing conversion tables or converting characters encoded in {{Transliteration|ja|Hepburn|Mojikyō}} to Unicode or other character sets.{{Cite web|date=2010-07-21|title=終戦宣言|trans-title=Announcement: The War is Over|url=https://www.seiwatei.net/info/endwar.htm|access-date=2020-07-07|language=ja|edition=28 January 2016|script-website=ja:青蛙亭漢語塾|trans-website=Seiwatei's Kanji Cram School}} Mere data, sometimes including the shapes of letters, are considered in many jurisdictions to be common property as they do not meet the threshold of originality.See also: fictitious entry; trap street.

Due to this legacy, however, {{ill|GlyphWiki|ja}} disallowed {{Transliteration|ja|Hepburn|Mojikyō}} data as of 2020.{{Cite web|title=データ・記事のライセンス|trans-title=License of our data and articles|url=http://glyphwiki.org/wiki/GlyphWiki:%E3%83%87%E3%83%BC%E3%82%BF%E3%83%BB%E8%A8%98%E4%BA%8B%E3%81%AE%E3%83%A9%E3%82%A4%E3%82%BB%E3%83%B3%E3%82%B9|access-date=2020-07-06|website=GlyphWiki|quote=今昔文字鏡およびその関連製品、データは、そのライセンス上グリフウィキには用いることができません。文字鏡番号(独自部分)および文字鏡のフォントに収録されているグリフそのもの、およびそれを参照、利用して作成していると判断できる情報は、グリフウィキに登録する際の典拠とすることはできませんので、ご協力をお願いいたします。 [Konjaku Mojikyō and related products and associated data are licensed in such a way that they are incompatible with our above GlyphWiki license. Neither the number of the Mojikyō encoding slot, nor the appearance of the glyph itself in Mojikyō{{'}}s fonts, nor any information whatsoever that can be judged to have been gathered by referring to a Mojikyō product, can be used when entering data into GlyphWiki. We absolutely cannot accept Mojikyō data. Please cooperate with us.]|edition=9 June 2010}}

Collected writing systems

= Living =

= Dead or obsolete =

See also

References

{{reflist}}

= Notes =

{{reflist|group=note}}