lexical density
{{Short description|Complexity of communication}}
Lexical density is a concept in computational linguistics that measures the structure and complexity of human communication in a language.{{cite book|author=Michael Halliday|title=Spoken and Written Language|url=https://books.google.com/books?id=T9RpAAAACAAJ|year=1985|publisher=Deakin University|isbn=978-0-7300-0309-0|pages=61–64}} Lexical density estimates the linguistic complexity in a written or spoken composition from the functional words (grammatical units) and content words (lexical units, lexemes). One method to calculate the lexical density is to compute the ratio of lexical items to the total number of words. Another method is to compute the ratio of lexical items to the number of higher structural items in a composition, such as the total number of clauses in the sentences.
The lexical density for an individual evolves with age, education, communication style, circumstances, unusual injuries or medical condition,{{cite journal|title=Predicting Lexical Density Growth Rate in Young Children With Autism Spectrum Disorders|author=Paul Yoder|journal=American Journal of Speech-Language Pathology|year=2006|volume=15|issue=4|pages=362–373}} and his or her creativity. The inherent structure of a human language and one's first language may impact the lexical density of the individual's writing and speaking style. Further, human communication in the written form is generally more lexically dense than in the spoken form after the early childhood stage.{{cite book|author=Michael Halliday|title=Spoken and Written Language|url=https://books.google.com/books?id=T9RpAAAACAAJ|year=1985|publisher=Deakin University|isbn=978-0-7300-0309-0|pages=61–75 (Chapter 5), 76–91 (Chapter 6)}}{{cite book|author=Victoria Johansson|title=Developmental aspects of text production in writing and speech|url=https://books.google.com/books?id=j9FGAQAAIAAJ|year=2009|publisher=Department of Linguistics and Phonetics, Centre for Languages and Literature, Lund University|isbn=978-91-974116-7-7|pages=1–16}} The lexical density impacts the readability of a composition and the ease with which the listener or reader can comprehend a communication.{{cite journal|author1=V To|author2=S Fan|author3=DP Thomas|year=2013| title= Lexical density and Readability: A case study of English Textbooks|journal= The International Journal of Language, Society and Culture|volume=37|issue=7|pages= 61–71}}{{cite journal | last=O'Loughlin | first=Kieran | title=Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test | journal=Language Testing | publisher=SAGE Publications | volume=12 | issue=2 | year=1995 | doi=10.1177/026553229501200205 | pages=217–237| s2cid=145638000 }} The lexical density may also impact the memorability and retention of a sentence and the message.{{cite journal | last=Perfetti | first=Charles A. | title=Lexical density and phrase structure depth as variables in sentence retention | journal=Journal of Verbal Learning and Verbal Behavior | publisher=Elsevier BV | volume=8 | issue=6 | year=1969 | issn=0022-5371 | doi=10.1016/s0022-5371(69)80035-6 | pages=719–724}}
Discussion
The lexical density is the proportion of content words (lexical items) in a given discourse. It can be measured either as the ratio of lexical items to total number of words, or as the ratio of lexical items to the number of higher structural items in the sentences (for example, clauses).{{cite book|author=Erik Castello|title=Text Complexity and Reading Comprehension Tests|url=https://books.google.com/books?id=rYzvuQ5mHUcC&pg=PA48|year=2008|publisher=Peter Lang|isbn=978-3-03911-717-8|pages=49–51}}{{cite book|author=Belinda Crawford Camiciottoli|title=The Language of Business Studies Lectures: A Corpus-assisted Analysis|url=https://books.google.com/books?id=4S5ejvLhUtcC&pg=PA73|year=2007|publisher=John Benjamins Publishing|isbn=978-90-272-5400-9|page=73}} A lexical item is typically the real content and it includes nouns, verbs, adjectives and adverbs. A grammatical item typically is the functional glue and thread that weaves the content and includes pronouns, conjunctions, prepositions, determiners, and certain classes of finite verbs and adverbs.
Lexical density is one of the methods used in discourse analysis as a descriptive parameter which varies with register and genre. There are many proposed methods for computing the lexical density of any composition or corpus. Lexical density may be determined as:
::
:Where:
:: = the analysed text's lexical density
:: = the number of lexical or grammatical tokens (nouns, adjectives, verbs, adverbs) in the analysed text
:: = the number of all tokens (total number of words) in the analysed text
=Ure lexical density=
{{Further |type-token distinction}}
Ure proposed the following formula in 1971 to compute the lexical density of a sentence:
::{{math|big=1| Ld {{=}} {{sfrac|The number of lexical items|The total number of words}} * 100 }}
Biber terms this ratio as "type-token ratio".{{cite book|author=Douglas Biber|title=Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure|url=https://books.google.com/books?id=t4CcpizGwgQC |year=2007|publisher=John Benjamins Publishing|isbn=978-90-272-2302-9|pages=97–98 with footnote 7}}
=Halliday lexical density=
In 1985, Halliday revised the denominator of the Ure formula and proposed the following to compute the lexical density of a sentence:
::{{math|big=1| Ld {{=}} {{sfrac|The number of lexical items|The total number of clauses}} * 100 }}
In some formulations, the Halliday proposed lexical density is computed as a simple ratio, without the "100" multiplier.
=Characteristics=
Lexical density measurements may vary for the same composition depending on how a "lexical item" is defined and which items are classified as lexical or as a grammatical item. Any adopted methodology when consistently applied across various compositions provides the lexical density of those compositions. Typically, the lexical density of a written composition is higher than a spoken composition. According to Ure, written forms of human communication in the English language typically have lexical densities above 40%, while spoken forms tend to have lexical densities below 40%. In a survey of historical texts by Michael Stubbs, the typical lexical density of fictional literature ranged between 40% and 54%, while non-fiction ranged between 40% and 65%.{{cite book|author1=Mark Warschauer|author2=Richard Kern|title=Network-Based Language Teaching: Concepts and Practice|url=https://books.google.com/books?id=wFH56QxG2uwC |year=2000|publisher=Cambridge University Press|isbn=978-0-521-66742-5|pages=107–108}}{{cite book|author=Michael Stubbs|title=Text and Corpus Analysis: Computer Assisted Studies of Language and Culture|url=https://books.google.com/books?id=2iAFIgAACAAJ|year=1996|publisher=Wiley|isbn=978-0-631-19512-2|pages=71–73}}
The relation and intimacy between the participants of a particular communication impact the lexical density, states Ure, as do the circumstances prior to the start of communication for the same speaker or writer. The higher lexical density of written forms of communication, she proposed, is primarily because written forms of human communication involve greater preparation, reflection and revisions. Human discussions and conversations involving or anticipating feedback tend to be sparser and have lower lexical density. In contrast, state Stubbs and Biber, instructions, law enforcement orders, news read from screen prompts within the allotted time, and literature that authors expect will be available to the reader for re-reading tend to maximize lexical density.{{cite book|author=Michael Stubbs|year= 1986|title= Talking about Text|chapter=Lexical density: A technique and some findings|editor=Malcolm Coulthard| publisher= University of Birmingham: English Language Research|pages= 27–42}} In surveys of lexical density of spoken and written materials across different European countries and age groups, Johansson and Strömqvist report that the lexical density of population groups were similar and depended on the morphological structure of the native language and within a country, the age groups sampled. The lexical density was highest for adults, while the variations estimated as lexical diversity, states Johansson, were higher for teenagers for the same age group (13-year-olds, 17-year-olds).{{cite journal|title=Lexical diversity and lexical density in speech and writing: a developmental perspective|author= Victoria Johansson|volume=53|year=2008|journal=Linguistics and Phonetics Working Papers|publisher=Lund University|pages=61–79}}{{cite journal|author1= Sven Strömqvist|author2= Victoria Johansson|author3=Sarah Kriz|author4=H Ragnarsdottir|author5=Ravid Aisenmann|author6=Dorit Ravid|author6-link=Dorit Ravid| year= 2002| title= Toward a crosslinguistic comparison of lexical quanta in speech and writing|journal = Written Language and Literacy|volume= 5|pages= 45–67|doi= 10.1075/wll.5.1.03str}}
See also
References
{{reflist}}
Further reading
- Ure, J (1971). Lexical density and register differentiation. In G. Perren and J.L.M. Trim (eds), Applications of Linguistics, London: Cambridge University Press. 443–452.
External links
- [http://textalyser.net Lexical density 'Textalyser']