CEDICT

{{Short description|Chinese–English dictionary}}

The CEDICT project was started by Paul Denisowski in 1997 and is maintained by a team on mdbg.net under the name CC-CEDICT, with the aim to provide a complete Chinese to English dictionary with pronunciation in pinyin for the Chinese characters.{{Cite book |last=Ken Lunde |url=https://archive.org/details/cjkvinformationp00lund/mode/2up?q=CEDICT+Denisowski |title=CJKV information processing |date=1999 |publisher=O'Reilly |isbn=978-1-56592-224-2}}https://cui.unige.ch/isi/reports/chinese.pdf{{Cite web |title=Chinese Retrieval System Using Hangeul Pronunciation of Chinese Language - ProQuest |url=https://www.proquest.com/openview/344ed09b2061ea9be79410b4e1d965a9/1?cbl=936334&pq-origsite=gscholar |access-date=2025-05-16 |website=www.proquest.com |language=en}}{{Cite journal |last=Peng |first=Gang |first2=Minett, James W. |last3=Wang |first3=William S.-Y. |date=2008-08-01 |title=The networks of syllables and characters in Chinese∗ |url=https://www.tandfonline.com/doi/abs/10.1080/09296170802159488 |journal=Journal of Quantitative Linguistics |volume=15 |issue=3 |pages=243–255 |doi=10.1080/09296170802159488 |issn=0929-6174|url-access=subscription }}{{Cite book |last=Applied Natural Language Processing Conference (6th : 2000 : Seattle |first=Wash ) |url=https://archive.org/details/proceedingsofcon0000appl/mode/2up?q=CEDICT+Denisowski |title=Proceedings of the conferences and proceedings of the ANLP-NAACL 2000 student research workshop : 6th Applied Natural Language Processing Conference [and] 1st Meeting of the North American Chapter of the Association for Computational Linguistics : April 29 - May 4, Seattle, Washington, USA |date=2000 |publisher=[New Brunswick, N.J.] : Association for Computational Linguistics ; San Francisco : Distributed by Morgan Kaufman Publishers |others=Internet Archive |isbn=978-1-55860-704-0}}

Content

CEDICT is a text file; other programs (or simply Notepad or egrep or equivalent) are needed to search and display it. This project is used by several other Chinese-English projects. The Unihan Database uses CEDICT data for most of its information about character compounds, but this is auxiliary and is explicitly not a part of the main Unicode database.{{Cite web|url=http://unicode.org/charts/unihan.html|title=Unihan Database Lookup|website=unicode.org}}

Features:

  • Traditional Chinese and Simplified Chinese
  • Pinyin (several pronunciations)
  • American English (several)
  • {{As of|2024|1|22}}, it had 122,444 entries in UTF-8.{{Cite web|url=https://www.mdbg.net/chinese/dictionary?page=cc-cedict|title=MDBG English to Chinese dictionary|website=www.mdbg.net}}

The basic format of a CEDICT entry is:

Traditional Simplified [pin1 yin1] /American English equivalent 1/equivalent 2/

漢字 汉字 [han4 zi4] /Chinese character/CL:個|个/

Example of a simple egrep search:

$ egrep -i 有勇無謀 cedict.txt

有勇無謀 有勇无谋 [you3 yong3 wu2 mou2] /bold but not very astute/

History

class="wikitable"
Year

! Event

1991

| EDICT Japanese dictionary project was started by Jim Breen.

1997

| CEDICT project started by Paul Denisowski, on the model of EDICT. Continued by Erik Peterson.

2007

| MDBG started a new project called [http://cc-cedict.org/ CC-CEDICT] which continues the CEDICT project with a new license: [https://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share Alike 3.0 License], allowing more projects to use it.The [https://github.com/zdict/dictionaries/blob/master/stardict-cedict-big5-2.4.2/LICENSE.CEDICT original CEDICT license] was for non-commercial use only, and did not allow entries to be added without permission. Additionally a work flow [http://cc-cedict.org/editor/] has been set up to streamline the process of submitting, reviewing and processing new entries.

Related projects

CEDICT has shown the way to some other projects:

  • HanDeDict (~156,000 Chinese entries)
  • CFDICT (~44,000 entries) for French
  • Some older CEDICT data is also found in the Adsotrans dictionary.
  • February 2012: [https://web.archive.org/web/20130501132107/http://cc-chedicc.wikispaces.com/ ChE-DICC], the Spanish-Chinese free dictionary starts (currently beta)
  • May 2017: CHDICT (11,000 entries) for Hungarian
  • CC-Canto is Pleco Software's addition of Cantonese language readings in Jyutping transcription to CC-CEDICT{{Cite web|url=https://cantonese.org/download.html|title=CC-Canto - A Cantonese dictionary for everyone|website=cantonese.org}}
  • Cantonese CEDICT features Cantonese language readings in Yale transcription and has Cantonese-specific words, many of which were taken from "A Dictionary of Cantonese Slang"http://writecantonese8.wordpress.com/2012/02/04/cantonese-cedict-project/ "Later, I was guided to merge data from Cantonese Stardict, which is an electronic version of “A Dictionary of Cantonese Slang”, into Cantonese CEDICT" in possible copyright infringement.{{cite web|url=http://stardict.sourceforge.net/|accessdate=18 November 2011|title=StarDict|publisher=Stardict.sourceforge.net}}

References