Google Neural Machine Translation
{{short description|System developed by Google to increase fluency and accuracy in Google Translate}}
{{Use mdy dates|date=July 2023}}
{{Translation sidebar}}
Google Neural Machine Translation (GNMT) was a neural machine translation (NMT) system developed by Google and introduced in November 2016 that used an artificial neural network to increase fluency and accuracy in Google Translate.{{citation|title=Found in translation: More accurate, fluent sentences in Google Translate|author=Barak Turovsky|url=https://blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/|work=Google Blog|date=November 15, 2016|accessdate=January 11, 2017}}{{citation|work=Google Research Blog|title=Zero-Shot Translation with Google's Multilingual Neural Machine Translation System|url=https://research.googleblog.com/2016/11/zero-shot-translation-with-googles.html|date=November 22, 2016|accessdate=January 11, 2017|author1=Mike Schuster |author2=Melvin Johnson |author3=Nikhil Thorat }}{{citation|url=https://medium.freecodecamp.com/the-mind-blowing-ai-announcement-from-google-that-you-probably-missed-2ffd31334805#.msj1mdvbh|title=The mind-blowing AI announcement from Google that you probably missed |author=Gil Fewster |date=January 5, 2017 |accessdate=January 11, 2017 |work=freeCodeCamp}}{{cite journal|title=Google's neural machine translation system: Bridging the gap between human and machine translation|first1=Yonghui|last1=Wu|first2=Mike|last2=Schuster|first3=Zhifeng|last3=Chen|first4=Quoc V.|last4=Le|first5=Mohammad|last5=Norouzi|arxiv=1609.08144|year=2016|bibcode=2016arXiv160908144W}} The neural network consisted of two main blocks, an encoder and a decoder, both of LSTM architecture with 8 1024-wide layers each and a simple 1-layer 1024-wide feedforward attention mechanism connecting them.{{cite web | url=https://smerity.com/articles/2016/google_nmt_arch.html | title=Peeking into the neural network architecture used for Google's Neural Machine Translation }} The total number of parameters has been variously described as over 160 million,{{cite arXiv | eprint=2112.10930 | last1=Qin | first1=Minghai | last2=Zhang | first2=Tianyun | last3=Sun | first3=Fei | last4=Chen | first4=Yen-Kuang | last5=Fardad | first5=Makan | last6=Wang | first6=Yanzhi | last7=Xie | first7=Yuan | title=Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting | year=2021 | class=cs.NE }} approximately 210 million,{{cite web | url=https://intellabs.github.io/nlp-architect/sparse_gnmt.html | title=Compression of Google Neural Machine Translation Model – NLP Architect by Intel® AI Lab 0.5.5 documentation }} 278 million{{cite arXiv | eprint=2104.02233 | last1=Langroudi | first1=Hamed F. | last2=Karia | first2=Vedant | last3=Pandit | first3=Tej | last4=Kudithipudi | first4=Dhireesha | title=TENT: Efficient Quantization of Neural Networks on the tiny Edge with Tapered FixEd PoiNT | year=2021 | class=cs.LG }} or 380 million.{{cite web | url=https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2 | title=Data Augmentation | How to use Deep Learning when you have Limited Data | date=May 19, 2021 }} It used WordPiece tokenizer, and beam search decoding strategy. It ran on Tensor Processing Units.
By 2020, the system had been replaced by another deep learning system based on a Transformer encoder and an RNN decoder.{{Cite web |title=Recent Advances in Google Translate |url=http://research.google/blog/recent-advances-in-google-translate/ |access-date=2024-05-08 |website=research.google |language=en}}
GNMT improved on the quality of translation by applying an example-based (EBMT) machine translation method in which the system learns from millions of examples of language translation. GNMT's proposed architecture of system learning was first tested on over a hundred languages supported by Google Translate. With the large end-to-end framework, the system learns over time to create better, more natural translations. GNMT attempts to translate whole sentences at a time, rather than just piece by piece. The GNMT network can undertake interlingual machine translation by encoding the semantics of the sentence, rather than by memorizing phrase-to-phrase translations.{{cite web|first1=Christian|last1=Boitet|first2=Hervé|last2=Blanchon|first3=Mark|last3=Seligman|first4=Valérie|last4=Bellynck|title=MT on and for the Web|url=http://www-clips.imag.fr/geta/herve.blanchon/Pdfs/NLP-KE-10.pdf|date=2010|accessdate=December 1, 2016|archive-date=March 29, 2017|archive-url=https://web.archive.org/web/20170329125916/http://www-clips.imag.fr/geta/herve.blanchon/Pdfs/NLP-KE-10.pdf|url-status=dead}}
History
The Google Brain project was established in 2011 in the "secretive Google X research lab" by Google Fellow Jeff Dean, Google Researcher Greg Corrado, and Stanford University Computer Science professor Andrew Ng.{{cite web|author1=Jeff Dean and Andrew Ng|title=Using large-scale brain simulations for machine learning and A.I.|url=http://googleblog.blogspot.com/2012/06/using-large-scale-brain-simulations-for.html|website=Official Google Blog|accessdate=January 26, 2015|date=June 26, 2012}}{{cite web|title=Google's Large Scale Deep Neural Networks Project| website=YouTube |url=https://www.youtube.com/watch?v=KELYHjq9Gbs|accessdate=October 25, 2015}}{{cite news|url=https://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?pagewanted=all|title=How Many Computers to Identify a Cat? 16,000|last=Markoff|first=John|authorlink=John Markoff|date=June 25, 2012|accessdate=February 11, 2014|newspaper=New York Times}} Ng's work has led to some of the biggest breakthroughs at Google and Stanford.{{cite web|url=https://www.technologyreview.com/s/530016/a-chinese-internet-giant-starts-to-dream/|title=A Chinese Internet Giant Starts to Dream: Baidu is a fixture of online life in China, but it wants to become a global power. Can one of the world's leading artificial intelligence researchers help it challenge Silicon Valley's biggest companies?|accessdate=January 11, 2017|author=Robert D. Hof|date=August 14, 2014|work=Technology Review}}
In November 2016, Google Neural Machine Translation system (GNMT) was introduced. Since then, Google Translate began using neural machine translation (NMT) in preference to its previous statistical methods (SMT){{citation|url=https://www.theregister.co.uk/2016/11/17/googles_neural_net_translates_languages_not_trained_on |website=The Register |title=Google's neural network learns to translate languages it hasn't been trained on: First time machine translation has used true transfer learning|date=November 17, 2016|author=Katyanna Quach|accessdate=January 11, 2017}}{{cite news|newspaper=The New York Times|first=Gideon|last=Lewis-Kraus|title=The Great A.I. Awakening|url=https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html|date=December 14, 2016|accessdate=January 11, 2017}}{{cite web|first1=Quoc|last1=Le|first2=Mike|last2=Schuster|title=A Neural Network for Machine Translation, at Production Scale|url=https://research.googleblog.com/2016/09/a-neural-network-for-machine.html|work=Google Research Blog|date=September 27, 2016|accessdate=December 1, 2016}} which had been used since October 2007, with its proprietary, in-house SMT technology.[http://googlesystem.blogspot.com/2007/10/google-translate-switches-to-googles.html Google Switches to its Own Translation System], October 22, 2007{{cite web|url=http://searchengineland.com/google-translate-drops-systran-for-home-brewed-translation-12502|title=Google Translate Drops SYSTRAN for Home-Brewed Translation|author=Barry Schwartz|work=Search Engine Land|date=October 23, 2007}}
Training GNMT was a big effort at the time and took, by a 2018 OpenAI estimate, on the order of 79 petaFLOP-days (or 7e21 FLOPs) of compute which was 1.5 orders of magnitude larger than Seq2seq model of 2014{{cite web | url=https://openai.com/research/ai-and-compute | title=AI and compute | date=June 9, 2022 }} (but about 2x smaller than GPT-J-6B in 2021{{cite web | url=https://github.com/kingoflolz/mesh-transformer-jax | title=Table of contents | website=GitHub }}).
Google Translate's NMT system uses a large artificial neural network capable of deep learning. By using millions of examples, GNMT improves the quality of translation, using broader context to deduce the most relevant translation. The result is then rearranged and adapted to approach grammatically based human language. GNMT's proposed architecture of system learning was first tested on over a hundred languages supported by Google Translate. GNMT did not create its own universal interlingua but rather aimed at finding the commonality between many languages using insights from psychology and linguistics.{{citation|url=https://medium.com/@chrismcdonald_94568/ok-slow-down-516f93f83ac8#.l0ti3ct0b|title=Commenting on Gil Fewster's January 5th article in the Atlantic|author=Chris McDonald|work=Medium |date=January 7, 2017|accessdate=January 11, 2017}} The new translation engine was first enabled for eight languages: to and from English and French, German, Spanish, Portuguese, Chinese, Japanese, Korean and Turkish in November 2016.{{cite web|first=Barak|last=Turovsky|title=Found in translation: More accurate, fluent sentences in Google Translate|url=https://www.blog.google/products/translate/found-translation-more-accurate-fluent-sentences-google-translate/|work=The Keyword Google Blog|date=November 15, 2016|accessdate=December 1, 2016}} In March 2017, three additional languages were enabled: Russian, Hindi and Vietnamese along with Thai for which support was added later.{{Cite web|url=https://techcrunch.com/2017/03/06/googles-smarter-a-i-powered-translation-system-expands-to-more-languages/|title=Google's smarter, A.I.-powered translation system expands to more languages|last=Perez|first=Sarah|date=March 6, 2017|website=TechCrunch|publisher=Oath Inc.|access-date=}}{{cite web|last1=Turovsky|first1=Barak|title=Higher quality neural translations for a bunch more languages|url=https://blog.google/products/translate/higher-quality-neural-translations-bunch-more-languages/|website=The Keyword Google Blog|date=March 6, 2017|accessdate=March 6, 2017}} Support for Hebrew and Arabic was also added with help from the Google Translate Community in the same month.{{Cite web|url=https://venturebeat.com/2017/03/30/google-now-provides-ai-powered-translations-for-arabic-and-hebrew/|title=Google now provides AI-powered translations for Arabic and Hebrew|last=Novet|first=Jordan|date=March 30, 2017|website=VentureBeat}} In mid April 2017 Google Netherlands announced support for Dutch and other European languages related to English.{{Cite web|url=https://nederland.googleblog.com/2017/04/grote-verbetering-voor-het-nederlands.html|title=Grote verbetering voor het Nederlands in Google Translate|last=Finge|first=Rachid|date=April 19, 2017|website=Google Netherlands Blog|language=Dutch|trans-title=Big improvement for Dutch in Google Translate}} Further support was added for nine Indian languages: Hindi, Bengali, Marathi, Gujarati, Punjabi, Tamil, Telugu, Malayalam and Kannada at the end of April 2017.{{Cite web|url=https://blog.google/products/translate/making-internet-more-inclusive-india/|title=Making the internet more inclusive in India|last=Turovsky|first=Barak|date=April 25, 2017|website=The Keyword}}
By 2020, Google had changed methodology to use a different neural network system based on transformers, and had phased out NMT.{{Cite web |title=Recent Advances in Google Translate |url=http://research.google/blog/recent-advances-in-google-translate/ |access-date=2024-05-08 |website=research.google |language=en}}
Evaluation
The GNMT system was said to represent an improvement over the former Google Translate in that it will be able to handle "zero-shot translation", that is it directly translates one language into another. For example, it might be trained just for Japanese-English and Korean-English translation, but can perform Japanese-Korean translation. The system appears to have learned to produce a language-independent intermediate representation of language (an "interlingua"), which allows it to perform zero-shot translation by converting from and to the interlingua. Google Translate previously first translated the source language into English and then translated the English into the target language rather than translating directly from one language to another.
A July 2019 study in Annals of Internal Medicine found that "Google Translate is a viable, accurate tool for translating non–English-language trials". Only one disagreement between reviewers reading machine-translated trials was due to a translation error. Since many medical studies are excluded from systematic reviews because the reviewers do not understand the language, GNMT has the potential to reduce bias and improve accuracy in such reviews.{{cite journal|last1=Jackson|first1=Jeffrey L|last2=Kuriyama|first2=Akira|last3=Anton|first3=Andreea|last4=Choi|first4=April|last5=Fournier|first5=Jean-Pascal|last6=Geier|first6=Anne-Kathrin|last7=Jacquerioz|first7=Frederique|last8=Kogan|first8=Dmitry|last9=Scholcoff|first9=Cecilia|last10=Sun|first10=Rao|title=The Accuracy of Google Translate for Abstracting Data From Non–English-Language Trials for Systematic Reviews|journal=Annals of Internal Medicine|date=July 30, 2019|volume=171|issue=9|page=678|doi=10.7326/M19-0891|pmid=31357212|s2cid=198980789|issn=0570-183X}}
Languages supported by GNMT
As of December 2021, all of the languages of Google Translate support GNMT, with Latin being the most recent addition.
{{Clear}}
{{div col|colwidth=16em}}
- Afrikaans
- Albanian
- Amharic
- Arabic
- Armenian
- Azerbaijani
- Basque
- Belarusian
- Bengali
- Bosnian
- Bulgarian
- Burmese
- Catalan
- Cebuano
- Chewa
- Chinese (Simplified)
- Chinese (Traditional)
- Corsican
- Croatian
- Czech
- Danish
- Dutch
- English
- Esperanto
- Estonian
- Filipino (Tagalog)
- Finnish
- French
- Galician
- Georgian
- German
- Greek
- Gujarati
- Haitian Creole
- Hausa
- Hawaiian
- Hebrew
- Hindi
- Hmong
- Hungarian
- Icelandic
- Igbo
- Indonesian
- Irish
- Italian
- Japanese
- Javanese
- Kannada
- Kazakh
- Khmer
- Kinyarwanda
- Korean
- Kurdish (Kurmanji)
- Kyrgyz
- Lao
- Latin
- Latvian
- Lithuanian
- Luxembourgish
- Macedonian
- Malagasy
- Malay
- Malayalam
- Maltese
- Maori
- Marathi
- Mongolian
- Nepali
- Norwegian (Bokmål)
- Odia
- Pashto
- Persian
- Polish
- Portuguese
- Punjabi (Gurmukhi)
- Romanian
- Russian
- Samoan
- Scottish Gaelic
- Serbian
- Shona
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Somali
- Sotho
- Spanish
- Sundanese
- Swahili
- Swedish
- Tajik
- Tamil
- Tatar
- Telugu
- Thai
- Turkish
- Turkmen
- Ukrainian
- Urdu
- Uyghur
- Uzbek
- Vietnamese
- Welsh
- West Frisian
- Xhosa
- Yiddish
- Yoruba
- Zulu
{{div col end}}
See also
{{div col|colwidth=30em}}
- Example-based machine translation
- Rule-based machine translation
- Comparison of machine translation applications
- Statistical machine translation
- Artificial intelligence
- Cache language model
- Computational linguistics
- Computer-assisted translation
- History of machine translation
- List of emerging technologies
- List of research laboratories for machine translation
- Neural machine translation
- Machine translation
- Universal translator
{{div col end}}
References
{{reflist}}
External links
{{Wikiversity|Topic:Computational linguistics}}
- [https://arxiv.org/abs/1609.08144 Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation]
- [http://www.omniglot.com/language/articles/machinetranslation.htm The Advantages and Disadvantages of Machine Translation]
- [http://www.statmt.org/ Statistical Machine Translation]
- [http://www.eamt.org/iamt.php International Association for Machine Translation (IAMT)] {{Webarchive|url=https://web.archive.org/web/20100624162302/http://www.eamt.org/iamt.php |date=June 24, 2010 }}
- [http://www.mt-archive.info Machine Translation Archive] {{Webarchive|url=https://web.archive.org/web/20190401232615/http://www.mt-archive.info/ |date=April 1, 2019 }} by John Hutchins. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
- [http://www.hutchinsweb.me.uk/ Machine translation (computer-based translation)] – Publications by John Hutchins (includes PDFs of several books on machine translation)
{{Google LLC}}
{{Approaches to machine translation}}
{{emerging technologies|topics=yes|infocom=yes}}
{{Natural Language Processing}}
Category:Applications of artificial intelligence
Category:Computational linguistics
Category:Artificial neural networks