Motor theory of speech perception
{{phonetics}}
The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates.{{Cite journal
| last1 = Liberman | first1 = A. M.
| last2 = Cooper | first2 = F. S.
| last3 = Shankweiler | first3 = D. P.
| last4 = Studdert-Kennedy | first4 = M.
| title = Perception of the speech code
| journal = Psychological Review
| volume = 74
| issue = 6
| pages = 431–461
| year = 1967
| pmid = 4170865 | doi=10.1037/h0020279
| doi = 10.1016/0010-0277(85)90021-6
| last1 = Liberman | first1 = A. M.
| last2 = Mattingly | first2 = I. G.
| title = The motor theory of speech perception revised
| journal = Cognition
| volume = 21
| issue = 1
| pages = 1–36
| year = 1985
| pmid = 4075760
| citeseerx = 10.1.1.330.220 | s2cid = 112932 }}{{Cite journal
| last1 = Liberman | first1 = A. M.
| last2 = Mattingly | first2 = I. G.
| title = A specialization for speech perception
| journal = Science
| volume = 243
| issue = 4890
| pages = 489–494
| year = 1989
| pmid = 2643163
| doi=10.1126/science.2643163
| bibcode = 1989Sci...243..489L
| last1 = Liberman | first1 = A. M.
| last2 = Whalen | first2 = D. H.
| title = On the relation of speech to language
| journal = Trends in Cognitive Sciences
| volume = 4
| issue = 5
| pages = 187–196
| year = 2000
| pmid = 10782105
| doi=10.1016/S1364-6613(00)01471-6
| s2cid = 12252728
| last1 = Galantucci | first1 = B.
| last2 = Fowler | first2 = C. A.
| last3 = Turvey | first3 = M. T.
| title = The motor theory of speech perception reviewed
| journal = Psychonomic Bulletin & Review
| volume = 13
| issue = 3
| pages = 361–377
| year = 2006
| pmid = 17048719
| pmc = 2746041
| doi=10.3758/bf03193857
}} It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them.
The hypothesis has gained more interest outside the field of speech perception than inside. This has increased particularly since the discovery of mirror neurons that link the production and perception of motor movements, including those made by the vocal tract. The theory was initially proposed in the Haskins Laboratories in the 1950s by Alvin Liberman and Franklin S. Cooper, and developed further by Donald Shankweiler, Michael Studdert-Kennedy, Ignatius Mattingly, Carol Fowler and Douglas Whalen.
Origins and development
File:Illu01 head neck.jpg we sense that they are made of auditory sounds. The motor theory of speech perception argues that behind the sounds we hear are the intended movements of the vocal tract that pronounces them.]]
The hypothesis has its origins in research using pattern playback to create reading machines for the blind that would substitute sounds for orthographic letters.Liberman, A. M. (1996). Speech: A special code. Cambridge, MA: MIT Press. {{ISBN|978-0-262-12192-7}} This led to a close examination of how spoken sounds correspond to the acoustic spectrogram of them as a sequence of auditory sounds. This found that successive consonants and vowels overlap in time with one another (a phenomenon known as coarticulation).{{Cite journal
| last1 = Liberman | first1 = A. M.
| last2 = Delattre | first2 = P.
| last3 = Cooper | first3 = F. S.
| title = The role of selected stimulus-variables in the perception of the unvoiced stop consonants
| journal = The American Journal of Psychology
| volume = 65
| issue = 4
| pages = 497–516
| year = 1952
| pmid = 12996688
| doi=10.2307/1418032
| jstor = 1418032
}}{{Cite journal | last1 = Liberman | first1 = A. M. | last2 = Delattre | first2 = P. C. | last3 = Cooper | first3 = F. S. | last4 = Gerstman | first4 = L. J. | title = The role of consonant-vowel transitions in the perception of the stop and nasal consonants | doi = 10.1037/h0093673 | journal = Psychological Monographs: General and Applied | volume = 68 | issue = 8 | pages = 1–13 | year = 1954 }} [http://www.haskins.yale.edu/Reprints/HL0011.pdf PDF] {{Webarchive|url=https://web.archive.org/web/20160303215651/http://www.haskins.yale.edu/Reprints/HL0011.pdf |date=2016-03-03 }}{{Cite journal
| last1 = Fowler | first1 = C. A.
| last2 = Saltzman | first2 = E.
| title = Coordination and coarticulation in speech production
| journal = Language and Speech
| volume = 36 ( Pt 2-3)
| issue = 2–3
| pages = 171–195
| year = 1993
| pmid = 8277807
| doi = 10.1177/002383099303600304
| s2cid = 7199908
}} [http://www.linguistics.berkeley.edu/~kjohnson/LSA317/lsa317_fowler_saltzman1993.pdf PDF] This suggested that speech is not heard like an acoustic "alphabet" or "cipher," but as a "code" of overlapping speech gestures.
=Associationist approach=
Initially, the theory was associationist: infants mimic the speech they hear and that this leads to behavioristic associations between articulation and its sensory consequences. Later, this overt mimicry would be short-circuited and become speech perception. This aspect of the theory was dropped, however, with the discovery that prelinguistic infants could already detect most of the phonetic contrasts used to separate different speech sounds.
=Cognitivist approach=
The behavioristic approach was replaced by a cognitivist one in which there was a speech module. The module detected speech in terms of hidden distal objects rather than at the proximal or immediate level of their input. The evidence for this was the research finding that speech processing was special such as duplex perception.{{Cite journal
| last1 = Liberman | first1 = A. M.
| last2 = Isenberg | first2 = D.
| last3 = Rakerd | first3 = B.
| title = Duplex perception of cues for stop consonants: Evidence for a phonetic mode
| journal = Perception & Psychophysics
| volume = 30
| issue = 2
| pages = 133–143
| year = 1981
| pmid = 7301513
| doi=10.3758/bf03204471
| doi-access = free
}}
=Changing distal objects=
Initially, speech perception was assumed to link to speech objects that were both
- the invariant movements of speech articulators
- the invariant motor commands sent to muscles to move the vocal tract articulators{{cite journal |last1=Liberman |first1=A. M. |title=The grammars of speech and language |doi=10.1016/0010-0285(70)90018-6 |journal=Cognitive Psychology |volume=1 |issue=4 |pages=301–323 |year=1970 |url=http://www.haskins.yale.edu/Reprints/HL0099.pdf |access-date=2009-06-02 |archive-date=2015-12-31 |archive-url=https://web.archive.org/web/20151231094548/http://www.haskins.yale.edu/Reprints/HL0099.pdf |url-status=dead }}
This was later revised to include the phonetic gestures rather than motor commands, and then the gestures intended by the speaker at a prevocal, linguistic level, rather than actual movements.{{cite journal |pmid=4075760 |year=1985 |last1=Liberman |first1=A. M. |title=The motor theory of speech perception revised |journal=Cognition |volume=21 |issue=1 |pages=1–36 |last2=Mattingly |first2=I. G. |url=http://www.haskins.yale.edu/Reprints/HL0519.pdf |doi=10.1016/0010-0277(85)90021-6 |citeseerx=10.1.1.330.220 |s2cid=112932 |access-date=2009-06-02 |archive-date=2021-04-15 |archive-url=https://web.archive.org/web/20210415025120/http://www.haskins.yale.edu/Reprints/HL0519.pdf |url-status=dead }}
=Modern revision=
The "speech is special" claim has been dropped, as it was found that speech perception could occur for nonspeech sounds (for example, slamming doors for duplex perception).{{Cite journal
| last1 = Fowler | first1 = C. A.
| last2 = Rosenblum | first2 = L. D.
| title = Duplex perception: A comparison of monosyllables and slamming doors
| journal = Journal of Experimental Psychology. Human Perception and Performance
| volume = 16
| issue = 4
| pages = 742–754
| year = 1990
| pmid = 2148589
| doi=10.1037/0096-1523.16.4.742
}}
=Mirror neurons=
The discovery of mirror neurons has led to renewed interest in the motor theory of speech perception, and the theory still has its advocates, although there are also critics.{{Cite journal
| last1 = Massaro | first1 = D. W.
| last2 = Chen | first2 = T. H.
| title = The motor theory of speech perception revisited
| journal = Psychonomic Bulletin & Review
| volume = 15
| issue = 2
| pages = 453–457; discussion 457–62
| year = 2008
| pmid = 18488668
| doi=10.3758/pbr.15.2.453
| s2cid = 9266946
}}
Support
=Nonauditory gesture information=
If speech is identified in terms of how it is physically made, then nonauditory information should be incorporated into speech percepts even if it is still subjectively heard as "sounds". This is, in fact, the case.
- The McGurk effect shows that seeing the production of a spoken syllable that differs from an auditory cue synchronized with it affects the perception of the auditory one. In other words, if someone hears "ba" but sees a video of someone pronouncing "ga", what they hear is different—some people believe they hear "da".
- People find it easier to hear speech in noise if they can see the speaker.{{Cite journal
| last1 = MacLeod | first1 = A.
| last2 = Summerfield | first2 = Q.
| title = Quantifying the contribution of vision to speech perception in noise
| journal = British Journal of Audiology
| volume = 21
| issue = 2
| pages = 131–141
| year = 1987
| pmid = 3594015
| doi=10.3109/03005368709077786
}}
- People can hear syllables better when their production can be felt haptically.{{Cite journal
| last1 = Fowler | first1 = C. A.
| last2 = Dekle | first2 = D. J.
| title = Listening with eye and hand: Cross-modal contributions to speech perception
| journal = Journal of Experimental Psychology. Human Perception and Performance
| volume = 17
| issue = 3
| pages = 816–828
| year = 1991
| pmid = 1834793
| doi=10.1037/0096-1523.17.3.816
}}
=Categorical perception=
Using a speech synthesizer, speech sounds can be varied in place of articulation along a continuum from {{IPA|/bɑ/}} to {{IPA|/dɑ/}} to {{IPA|/ɡɑ/}}, or in voice onset time on a continuum from {{IPA|/dɑ/}} to {{IPA|/tɑ/}} (for example). When listeners are asked to discriminate between two different sounds, they perceive sounds as belonging to discrete categories, even though the sounds vary continuously. In other words, 10 sounds (with the sound on one extreme being {{IPA|/dɑ/}} and the sound on the other extreme being {{IPA|/tɑ/}}, and the ones in the middle varying on a scale) may all be acoustically different from one another, but the listener will hear all of them as either {{IPA|/dɑ/}} or {{IPA|/tɑ/}}. Likewise, the English consonant {{IPA|/d/}} may vary in its acoustic details across different phonetic contexts (the /d/ in {{IPA|/du/}} does not technically sound the same as the one in {{IPA|/di/}}, for example), but all {{IPA|/d/}}'s as perceived by a listener fall within one category (voiced alveolar plosive) and that is because "linguistic representations are abstract, canonical, phonetic segments or the gestures that underlie these segments."{{cite encyclopedia |vauthors = Nygaard LC, Pisoni DB |year=1995 |title=Speech Perception: New Directions in Research and Theory |editor1=J.L. Miller |editor2=P.D. Eimas |encyclopedia=Handbook of Perception and Cognition: Speech, Language, and Communication |location=San Diego |publisher=Academic Press| isbn= 978-0-12-497770-9 }} This suggests that humans identify speech using categorical perception, and thus that a specialized module, such as that proposed by the motor theory of speech perception, may be on the right track.{{Cite journal
| last1 = Liberman | first1 = A. M.
| last2 = Harris | first2 = K. S.
| last3 = Hoffman | first3 = H. S.
| last4 = Griffith | first4 = B. C.
| title = The discrimination of speech sounds within and across phoneme boundaries
| journal = Journal of Experimental Psychology
| volume = 54
| issue = 5
| pages = 358–368
| year = 1957
| pmid = 13481283 | doi=10.1037/h0044417
| s2cid = 10117886
}}
=Speech imitation=
If people can hear the gestures in speech, then the imitation of speech should be very fast, as in when words are repeated that are heard in headphones as in speech shadowing.{{Cite journal
| doi = 10.1038/244522a0
| last1 = Marslen-Wilson | first1 = W.
| title = Linguistic structure and speech shadowing at very short latencies
| journal = Nature
| volume = 244
| issue = 5417
| pages = 522–523
| year = 1973
| pmid = 4621131
| bibcode = 1973Natur.244..522M | s2cid = 4220775 }} People can repeat heard syllables more quickly than they would be able to produce them normally.{{Cite journal
| last1 = Porter Jr | first1 = R. J.
| last2 = Lubker | first2 = J. F.
| title = Rapid reproduction of vowel-vowel sequences: Evidence for a fast and direct acoustic-motoric linkage in speech
| journal = Journal of Speech and Hearing Research
| volume = 23
| issue = 3
| pages = 593–602
| year = 1980
| pmid = 7421161
| doi=10.1044/jshr.2303.593
}}
=Speech production=
- Hearing speech activates vocal tract muscles,{{Cite journal
| last1 = Fadiga | first1 = L.
| last2 = Craighero | first2 = L.
| last3 = Buccino | first3 = G.
| last4 = Rizzolatti | first4 = G.
| title = Speech listening specifically modulates the excitability of tongue muscles: A TMS study
| journal = The European Journal of Neuroscience
| volume = 15
| issue = 2
| pages = 399–402
| year = 2002
| pmid = 11849307 | doi=10.1046/j.0953-816x.2001.01874.x
| citeseerx = 10.1.1.169.4261
| s2cid = 16504172
}} and the motor cortex{{Cite journal
| last1 = Watkins | first1 = K. E.
| last2 = Strafella | first2 = A. P.
| last3 = Paus | first3 = T.
| title = Seeing and hearing speech excites the motor system involved in speech production
| journal = Neuropsychologia
| volume = 41
| issue = 8
| pages = 989–994
| year = 2003
| pmid = 12667534
| doi=10.1016/s0028-3932(02)00316-0
| s2cid = 518384
}} and premotor cortex.{{Cite journal
| last1 = Wilson | first1 = S. M.
| last2 = Saygin | first2 = A. E. P.
| last3 = Sereno | first3 = M. I.
| last4 = Iacoboni | first4 = M.
| title = Listening to speech activates motor areas involved in speech production
| doi = 10.1038/nn1263
| journal = Nature Neuroscience
| volume = 7
| issue = 7
| pages = 701–702
| year = 2004
| pmid = 15184903
| s2cid = 8080063
}} The integration of auditory and visual input in speech perception also involves such areas.{{Cite journal
| last1 = Skipper | first1 = J. I.
| last2 = Van Wassenhove | first2 = V.
| last3 = Nusbaum | first3 = H. C.
| last4 = Small | first4 = S. L.
| title = Hearing Lips and Seeing Voices: How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception
| doi = 10.1093/cercor/bhl147
| journal = Cerebral Cortex
| volume = 17
| issue = 10
| pages = 2387–2399
| year = 2006
| pmid = 17218482
| pmc =2896890
}}
- Disrupting the premotor cortex disrupts the perception of speech units such as plosives.{{Cite journal
| last1 = Meister | first1 = I. G.
| last2 = Wilson | first2 = S. M.
| last3 = Deblieck | first3 = C.
| last4 = Wu | first4 = A. D.
| last5 = Iacoboni | first5 = M.
| title = The Essential Role of Premotor Cortex in Speech Perception
| doi = 10.1016/j.cub.2007.08.064
| journal = Current Biology
| volume = 17
| issue = 19
| pages = 1692–1696
| year = 2007
| pmid = 17900904
| pmc = 5536895
| bibcode = 2007CBio...17.1692M
}}
- The activation of the motor areas occurs in terms of the phonemic features which link with the vocal track articulators that create speech gestures.{{Cite journal
| last1 = Pulvermuller | first1 = F.
| last2 = Huss | first2 = M.
| last3 = Kherif | first3 = F.
| author4 = Moscoso del Prado Martin F
| last5 = Hauk | first5 = O.
| last6 = Shtyrov | first6 = Y.
| title = Motor cortex maps articulatory features of speech sounds
| doi = 10.1073/pnas.0509989103
| journal = Proceedings of the National Academy of Sciences
| volume = 103
| issue = 20
| pages = 7865–7870
| year = 2006
| pmid = 16682637
| pmc =1472536
| bibcode = 2006PNAS..103.7865P
| doi-access = free
}}
- The perception of a speech sound is aided by pre-emptively stimulating the motor representation of the articulators responsible for its pronunciation .{{Cite journal
| last1 = d'Ausilio | first1 = A.
| last2 = Pulvermüller | first2 = F.
| last3 = Salmas | first3 = P.
| last4 = Bufalari | first4 = I.
| last5 = Begliomini | first5 = C.
| last6 = Fadiga | first6 = L.
| doi = 10.1016/j.cub.2009.01.017
| title = The Motor Somatotopy of Speech Perception
| journal = Current Biology
| volume = 19
| issue = 5
| pages = 381–385
| year = 2009
| pmid = 19217297
| doi-access = free
| bibcode = 2009CBio...19..381D
| hdl = 11392/534437
| hdl-access = free
}}
- Auditory and motor cortical coupling is restricted to a specific range of neuronal firing frequency.{{cite journal |last1=Assaneo |first1=M. Florencia |last2=Poeppel |first2=David |title=The coupling between auditory and motor cortices is rate-restricted: Evidence for an intrinsic speech-motor rhythm |journal=Science Advances |date=2018 |volume=4 |issue=2 |pages=eaao3842 |doi= 10.1126/sciadv.aao3842 |pmid=29441362 |pmc=5810610 |bibcode=2018SciA....4.3842A }}
=Perception-action meshing=
Evidence exists that perception and production are generally coupled in the motor system. This is supported by the existence of mirror neurons that are activated both by seeing (or hearing) an action and when that action is carried out.{{Cite journal
| last1 = Rizzolatti
| first1 = G.
| last2 = Craighero
| first2 = L.
| doi = 10.1146/annurev.neuro.27.070203.144230
| title = The Mirror-Neuron System
| journal = Annual Review of Neuroscience
| volume = 27
| pages = 169–192
| year = 2004
| pmid = 15217330
| s2cid = 1729870
}} [http://web.mit.edu/hst.722/www/Topics/Language/RizzolattiReview2004.pdf PDF] {{Webarchive|url=https://web.archive.org/web/20070630021006/http://web.mit.edu/hst.722/www/Topics/Language/RizzolattiReview2004.pdf |date=2007-06-30 }} Another source of evidence is that for common coding theory between the representations used for perception and action.{{Cite journal
| last1 = Hommel | first1 = B.
| last2 = Müsseler | first2 = J.
| last3 = Aschersleben | first3 = G.
| last4 = Prinz | first4 = W.
| title = The Theory of Event Coding (TEC): A framework for perception and action planning
| journal = The Behavioral and Brain Sciences
| volume = 24
| issue = 5
| pages = 849–878; discussion 878–937
| year = 2001
| pmid = 12239891 | doi=10.1017/s0140525x01000103
}}
Criticisms
The motor theory of speech perception is not widely held in the field of speech perception, though it is more popular in other fields, such as theoretical linguistics. As three of its advocates have noted, "it has few proponents within the field of speech perception, and many authors cite it primarily to offer critical commentary".p. 361 Several critiques of it exist.{{cite book
|last1=Massaro
|first1=D. W.
|year=1997
|title= Perceiving talking faces: From speech perception to a behavioral principle |location=Cambridge, MA
|publisher=MIT Press
|isbn = 978-0-262-13337-1}}
=Multiple sources=
=Production=
The motor theory of speech perception would predict that speech motor abilities in infants predict their speech perception abilities, but in actuality it is the other way around.{{cite journal|pmid=15260865|year = 2004|last1 = Tsao|first1 = F. M.|title = Speech perception in infancy predicts language development in the second year of life: A longitudinal study|journal = Child Development|volume = 75|issue = 4|pages = 1067–84|last2 = Liu|first2 = H. M.|last3 = Kuhl|first3 = P. K.|doi = 10.1111/j.1467-8624.2004.00726.x| s2cid=10954073 | url=http://ntur.lib.ntu.edu.tw//handle/246246/173252 |url-access = subscription}} It would also predict that defects in speech production would impair speech perception, but they do not.{{cite journal|pmid=6081929|year = 1967|last1 = MacNeilage|first1 = P. F.|title = Speech production and perception in a patient with severe impairment of somesthetic perception and motor control|journal = Journal of Speech and Hearing Research|volume = 10|issue = 3|pages = 449–67|last2 = Rootes|first2 = T. P.|last3 = Chase|first3 = R. A.|doi = 10.1044/jshr.1003.449}} However, this only affects the first and already superseded behaviorist version of the theory, where infants were supposed to learn all production-perception patterns by imitation early in childhood. This is no longer the mainstream view of motor-speech theorists.
=Speech module=
Several sources of evidence for a specialized speech module have failed to be supported.
- Duplex perception can be observed with door slams.
- The McGurk effect can also be achieved with nonlinguistic stimuli, such as showing someone a video of a basketball bouncing but playing the sound of a ping-pong ball bouncing.{{Citation needed|date=June 2009}}
- As for categorical perception, listeners can be sensitive to acoustic differences within single phonetic categories.
As a result, this part of the theory has been dropped by some researchers.
=Sublexical tasks=
The evidence provided for the motor theory of speech perception is limited to tasks such as syllable discrimination that use speech units not full spoken words or spoken sentences. As a result, "speech perception is sometimes interpreted as referring to the perception of speech at the sublexical level. However, the ultimate goal of these studies is presumably to understand the neural processes supporting the ability to process speech sounds under ecologically valid conditions, that is, situations in which successful speech sound processing ultimately leads to contact with the mental lexicon and auditory comprehension."{{Cite journal
| last1 = Hickok | first1 = G.
| last2 = Poeppel | first2 = D.
| doi = 10.1038/nrn2113
| title = The cortical organization of speech processing
| journal = Nature Reviews Neuroscience
| volume = 8
| issue = 5
| pages = 393–402
| year = 2007
| pmid = 17431404
| s2cid = 6199399
}} See page 394 This however creates the problem of " a tenuous connection to their implicit target of investigation, speech recognition".
Birds
It has been suggested that birds also hear each other's bird song in terms of vocal gestures.{{Cite journal
| doi = 10.1126/science.4012321
| last1 = Williams | first1 = H.
| last2 = Nottebohm | first2 = F.
| title = Auditory responses in avian vocal motor neurons: A motor theory for song perception in birds
| journal = Science
| volume = 229
| issue = 4710
| pages = 279–282
| year = 1985
| pmid = 4012321
| bibcode = 1985Sci...229..279W }}
See also
References
{{reflist|2}}
External links
- [http://www.haskins.yale.edu/ Haskins Laboratories] {{Webarchive|url=https://web.archive.org/web/20190509003749/http://www.haskins.yale.edu/ |date=2019-05-09 }}
- [http://www.haskins.yale.edu/pubs.html Source of pdfs upon the motor theory of speech perception] {{Webarchive|url=https://web.archive.org/web/20090504100143/http://www.haskins.yale.edu/pubs.html |date=2009-05-04 }}
{{DEFAULTSORT:Motor Theory Of Speech Perception}}