trigram

{{short description|Special case of the n-gram, where n is 3}}

{{otheruses}}

{{Refimprove|date=December 2009}}

Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for performing statistical analysis of texts and in cryptography for control and use of ciphers and codes. See results of analysis of "[https://www3.nd.edu/~busiforc/handouts/cryptography/Letter%20Frequencies.html Letter Frequencies in the English Language]".

Frequency

Context is very important, varying analysis rankings and percentages are easily derived by drawing from different sample sizes, different authors; or different document types: poetry, science-fiction, technology documentation; and writing levels: stories for children versus adults, military orders, and recipes.

Typical cryptanalytic frequency analysis finds that the 16 most common character-level trigrams in English are:{{cite book |last= Lewand |first= Robert |title= Cryptological Mathematics |publisher= The Mathematical Association of America |year= 2000 |page= 37 |url= {{google books |id= dx8zM-VeKI8C |page= 37 |text= Most Common Trigraphs in the English Language |plainurl= yes}} |isbn= 978-0-88385-719-9}}{{cite web |url= http://pages.central.edu/emp/LintonT/classes/spring01/cryptography/letterfreq.html |title= Relative Frequencies of Letters in General English Plain text |website= Central College |first= Tom |last= Linton |url-status= dead |archive-date= January 22, 2007 |archive-url= https://web.archive.org/web/20070122235914/http://pages.central.edu/emp/LintonT/classes/spring01/cryptography/letterfreq.html |date= 2001 |series= Cryptography |edition= Spring }}

class="wikitable sortable"
RankTrigramFrequency{{cite web |url= http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/ |title= English Letter Frequencies |website= Practical Cryptography }}
(Different source)
1the1.81%
2and0.73%
3tha0.33%
4ent0.42%
5ing0.72%
6ion0.42%
7tio0.31%
8for0.34%
9nde
10has
11nce
12edt
13tis
14oft0.22%
15sth0.21%
16men

Because encrypted messages sent by telegraph often omit punctuation and spaces, cryptographic frequency analysis of such messages includes trigrams that straddle word boundaries. This causes trigrams such as "edt" to occur frequently, even though it may never occur in any one word of those messages.{{cite web |url= https://fuelonline.com/voice-search-seo-voice-seo/ |title= Voice Search SEO |website= Fuelonline }}

Examples

The sentence "the quick red fox jumps over the lazy brown dog" has the following word-level trigrams:

the quick red

quick red fox

red fox jumps

fox jumps over

jumps over the

over the lazy

the lazy brown

lazy brown dog

And the word-level trigram "the quick red" has the following character-level trigrams (where an underscore "_" marks a space):

the

he_

e_q

_qu

qui

uic

ick

ck_

k_r

_re

red

References