List of speech recognition software

{{Short description|none}}

Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.

Acoustic models and speech corpus (compilation)

The following list presents notable speech recognition software engines with a brief synopsis of characteristics.

class="wikitable sortable"
Application nameDescriptionOpen-sourceLicenseOperating systemProgramming languageSupported language, noteOffline or online
CMU SphinxHMM{{Yes}}BSD styleCross-platformJavaEnglish, German, French, Mandarin, RussianOffline
HTKHMM neural net{{No}}HTK specificCross-platformCEnglish; version 3.5 released December 2015
JuliusHMM trigrams{{Yes}}BSD style, non-commercialCross-platformCJapanese, English; [https://github.com/julius-speech/julius#english]Offline
KaldiNeural net{{Yes}}ApacheCross-platformC++English
RWTH ASRRWTH Aachen University{{No}}RWTH ASR, non-commercial use onlyLinux, macOSC++English
WhisperEncoder/decoder transformer{{Yes}}MIT licenseCross-platformPython (programming language)MultilingualOnline (through API) and Offline

Macintosh

class="wikitable"
Application nameDescriptionOpen-sourceLicensePriceNote
Dragon for Mac (discontinued 2018)macOS; by Nuance{{No}}{{Proprietary}}
Dragon Dictate (discontinued)macOS; by Nuance{{No}}{{Proprietary}}
MacSpeech Scribe (discontinued)Transcription from recorded text; acquired by Nuance
iListen (discontinued)PowerPC Macintosh; discontinued by MacSpeech; acquired by Nuance
Speakable itemsIncluded with macOS
ViaVoice (discontinued)IBM Product; acquired by Nuance
Voice NavigatorOriginal GUI voice control; 1989

Cross-platform web apps based on Chrome

The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API.{{cite web |url=https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html |title=Web Speech API Specification |website=dvcs.w3.org |url-status=live |archive-url=https://web.archive.org/web/20160621225102/https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html |archive-date=2016-06-21 }}

class="wikitable"
Application nameDescriptionOpen-sourceLicensePriceNote
Speechmatics{{cite web |last=Orlowski |first=Andrew |title=Total recog: British AI makes universal speech breakthrough |url=https://www.theregister.co.uk/2017/12/01/british_ai_makes_universal_speech_breakthrough/ |website=The Register |publisher=Situation Publishing |access-date=17 May 2018}}Cloud based and on-premise automatic speech recognition{{No}}{{Proprietary}}From £0.06 per minute of audio

Mobile devices and smartphones

Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including:

class="wikitable sortable"
Application nameDescriptionOpen-sourceLicensePriceNote
Assistant.aiAssistant for Android, iOS and Windows Phone{{No}}{{Proprietary}}, freewareFreeDiscontinued
Dragon Dictation{{No}}{{Proprietary}}, freewareFree
Google NowAndroid voice search{{No}}{{Proprietary}}, freewareFree
Google Voice Search{{No}}{{Proprietary}}, freewareFree
Microsoft CortanaMicrosoft voice search{{No}}{{Proprietary}}, freewareFree
Siri Personal AssistantApple's virtual personal assistant{{No}}{{Proprietary}}, freewareFree
Alexa – Amazon EchoAmazon's personal assistant{{No}}{{Proprietary}}
SILVIAAndroid and iOS{{No}}
Vlingo

Windows

=Windows built-in speech recognition=

The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10.

Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10.

=Windows 7, 8, 10, 11 third-party speech recognition=

  • Braina – Dictate into third party software and websites,{{cite web |url=http://www.brainasoft.com/braina/speech-to-text.html |title=Speech Recognition Software for Windows PC – Braina |website=www.brainasoft.com |url-status=live |archive-url=https://web.archive.org/web/20150407054442/http://www.brainasoft.com/braina/speech-to-text.html |archive-date=2015-04-07 }} fill web forms and execute vocal commands.{{cite web |url=https://www.capterra.com/speech-recognition-software/ |title=Dynamic Faceting-List of Most 57 Speech Recognition SWs and Web Services |archive-url=https://web.archive.org/web/20190213161952/https://www.capterra.com/speech-recognition-software/ |language=en |archive-date=February 13, 2019 |url-status=live |access-date=February 23, 2019 |df=mdy-all }}
  • Dragon NaturallySpeaking from Nuance Communications – Successor to the older DragonDictate product. Focus on dictation. 64-bit Windows support since version 10.1.
  • Tazti – Create speech command profiles to play PC games and control applications – programs. Create speech commands to open files, folders, webpages, applications. Windows 7, Windows 8 and Windows 8.1 versions.{{cite web |last=O'Neill |first=Mark |title=Control your PC with these 5 speech recognition programs |website=PC World |url=http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html |date=2013-11-06 |access-date=2013-12-30 |url-status=live |archive-url=https://web.archive.org/web/20140101030044/http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html |archive-date=2014-01-01 }}
  • Voice Finger – software that improves the Windows speech recognition system by adding several extensions to it. The software enables controlling the mouse and the keyboard by only using the voice. It is especially useful for aiding users to overcome disabilities or to heal from computer injuries.

=Microsoft Speech API=

The first version of the Microsoft Speech API was released for Windows NT 3.51 and Windows 95 in 1995, it was then part of Windows up to Windows Vista. This initial version already contained Direct Speech Recognition and Direct Text To Speech APIs which applications could use to directly control engines, as well as simplified 'higher-level' Voice Command and Voice Talk APIs. Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It can also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, the pure SDK form lacks any user interface (numerous applications were available), and thus is unsuitable for end users.

Built-in software

Interactive voice response

The following are interactive voice response (IVR) systems:

  • CSLU Toolkit
  • Genesys{{cite web |url=http://www.genesys.com/platform-services/intelligent-voice-response |title=Interactive Voice Response |website=Genesys |url-status=live |archive-url=https://web.archive.org/web/20161014010400/http://www.genesys.com/platform-services/intelligent-voice-response |archive-date=2016-10-14 }}
  • HTK – copyrighted by Microsoft, but allows altering software for licensee's internal use
  • LumenVox ASR
  • Tellme Networks; acquired by Microsoft

Unix-like x86 and x86-64 speech transcription software

  • Janus Recognition Toolkit (JRTk)[http://isl.ira.uka.de/downloads/asru_hagen.ps]{{dead link|date=February 2018}}{{cite book |chapter=Janus-III: speech-to-speech translation in multiple languages |last1=Lavie |first1=A. |last2=Waibel |first2=A. |last3=Levin |first3=L. |last4=Finke |first4=M. |last5=Gates |first5=D. |last6=Gavalda |first6=M. |last7=Zeppenfeld |first7=T. |last8=Zhan |first8=Puming |date=1 April 1997 |publisher=IEEE Xplore |volume=1 |pages=99–102 |doi=10.1109/ICASSP.1997.599557 |title=1997 IEEE International Conference on Acoustics, Speech, and Signal Processing |isbn=978-0-8186-7919-3 |citeseerx=10.1.1.36.6967 |s2cid=1514209 }}
  • Mozilla DeepSpeech is developing an open-source Speech-To-Text engine based on Baidu's deep speech research paper.{{cite web |title=A TensorFlow implementation of Baidu's DeepSpeech architecture |date=2017-12-05 |url=https://github.com/mozilla/DeepSpeech |publisher=Mozilla |access-date=2017-12-05}}

Discontinued software

  • IBM VoiceType (formerly IBM Personal Dictation System)
  • IBM ViaVoice – Embedded version still maintained by IBM.{{cite web |url=http://www-01.ibm.com/software/pervasive/embedded_viavoice/ |title=IBM - Embedded ViaVoice - Embedded ViaVoice - Software |access-date=2010-06-29 |url-status=live |archive-url=https://web.archive.org/web/20100808052606/http://www-01.ibm.com/software/pervasive/embedded_viavoice/ |archive-date=2010-08-08 }} No longer supported for versions above Windows Vista.{{cite web |url=http://nuance.custhelp.com/app/answers/detail/a_id/5775/p/31/c/980/r_id/100023 |title=Nuance product support for Microsoft Windows 7 |website=Nuance Communications, Customer Help |access-date=2019-03-16}} Untested above macOS 10.4 or on Macintoshes with an Intel chipset.{{cite web |url=http://nuance.custhelp.com/app/answers/detail/a_id/4987/related/1/p/31/c/980/r_id/100023 |title=ViaVoice for Mac OS X on Intel Chipset |website=Nuance Communications, Customer Help |access-date=2019-03-16}}
  • Quack.com; acquired by AOL; the name has now been reused for an iPad search app.
  • SpeechWorks from Nuance Communications.
  • Yap Speech Cloud – Speech-to-text platform acquired by Amazon.com.

See also

  • {{annotated link|Speech recognition software for Linux}}
  • {{annotated link|Transcription software}}

References

{{Reflist}}

{{DEFAULTSORT:List Of Speech Recognition Software}}

*

Category:Speech recognition

Speech recognition software