Basic Latin (Unicode block)
{{Infobox Unicode block
|blockname = Basic Latin
{{nobold|1=or}}
C0 Controls and Basic Latin
|rangestart = 0000
|rangeend = 007F
|script1 = {{nowrap|Latin (52 characters)}}
|script2 = {{nowrap|Common (76 characters)}}
|symbols = Arabic numerals
Punctuation
|alphabets = English
French
German
Spanish
Vietnamese
|1_0_0 = 128
|controls = 33
|sources = ISO/IEC 8859, ISO 646
|note = {{cite web|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|accessdate=2023-07-26}}{{cite web|url=https://www.unicode.org/versions/enumeratedversions.html|title=Enumerated Versions of The Unicode Standard|work=The Unicode Standard|accessdate=2023-07-26}}
}}
The Basic Latin Unicode block,{{cite web|url=https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt|title=block.txt|accessdate=2023-03-23|publisher=The Unicode Consortium}} sometimes informally called C0 Controls and Basic Latin,{{cite web|url=https://www.unicode.org/charts/PDF/U0000.pdf|title=C0 Controls and Basic Latin|work=The Unicode Standard, Version 15.0|publisher=Unicode, Inc.|year=2022|access-date=March 22, 2023}} is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.{{cite book|title=The Unicode Standard Version 1.0, Volume 1|year=1990|publisher=Addison-Wesley Publishing Company, Inc.|isbn=0-201-56788-1}} Its block name in Unicode 1.0 was ASCII.{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=version 1.0 |title=3.8: Block-by-Block Charts |publisher=Unicode Consortium}}
Table of characters
class="wikitable collapsible"
!Code !Result !Description !Acronym |
colspan=4 | C0 controls |
U+0000
| | NUL |
U+0001
| | SOH |
U+0002
| | STX |
U+0003
| | ETX |
U+0004
| | End-of-transmission character | EOT |
U+0005
| | ENQ |
U+0006
| | ACK |
U+0007
| | BEL |
U+0008
| | BS |
U+0009
| | HT |
U+000A
| | LF |
U+000B
| | VT |
U+000C
| | FF |
U+000D
| | CR |
U+000E
| | SO |
U+000F
| | Shift In | SI |
U+0010
| | DLE |
U+0011
| | DC1 |
U+0012
| | DC2 |
U+0013
| | DC3 |
U+0014
| | DC4 |
U+0015
| | Negative-acknowledge character | NAK |
U+0016
| | SYN |
U+0017
| | ETB |
U+0018
| | CAN |
U+0019
| | EM |
U+001A
| | SUB |
U+001B
| | ESC |
U+001C
| | FS |
U+001D
| | GS |
U+001E
| | RS |
U+001F
| | US |
colspan=4 | ASCII punctuation and symbols |
U+0020
| |SP |
U+0021
|! | EXC |
U+0022
|" | QUO |
U+0023
|# | |
U+0024
|$ | |
U+0025
|% | |
U+0026
|& | |
U+0027
|' | |
U+0028
|( | |
U+0029
|) | |
U+002A
|* | |
U+002B
|{{+}} | |
U+002C
|, | |
U+002D
| - | |
U+002E
|. | |
U+002F
|/ | |
colspan=4 | ASCII digits |
U+0030
|0 | |
U+0031
|1 | |
U+0032
|2 | |
U+0033
|3 | |
U+0034
|4 | |
U+0035
|5 | |
U+0036
|6 | |
U+0037
|7 | |
U+0038
|8 | |
U+0039
|9 | |
colspan=4 | ASCII punctuation and symbols |
U+003A
|: | |
U+003B
|; | |
U+003C
|< | |
U+003D
|= | |
U+003E
|> | |
U+003F
|? | |
U+0040
|@ | |
colspan=4 | Uppercase Latin alphabet |
U+0041
|A | |
U+0042
|B | |
U+0043
|C | |
U+0044
|D | |
U+0045
|E | |
U+0046
|F | |
U+0047
|G | |
U+0048
|H | |
U+0049
|I | |
U+004A
|J | |
U+004B
|K | |
U+004C
|L | |
U+004D
|M | |
U+004E
|N | |
U+004F
|O | |
U+0050
|P | |
U+0051
|Q | |
U+0052
|R | |
U+0053
|S | |
U+0054
|T | |
U+0055
|U | |
U+0056
|V | |
U+0057
|W | |
U+0058
|X | |
U+0059
|Y | |
U+005A
|Z | |
colspan=4 | ASCII punctuation and symbols |
U+005B
|[ | |
U+005C
|\ |Backslash {{ref label|backslash|A|A}} | |
U+005D
|] | |
U+005E
|^ | |
U+005F
|_ | |
U+0060
|` | |
colspan=4 | Lowercase Latin alphabet |
U+0061
|a | |
U+0062
|b | |
U+0063
|c | |
U+0064
|d | |
U+0065
|e | |
U+0066
|f | |
U+0067
|g | |
U+0068
|h | |
U+0069
|i | |
U+006A
|j | |
U+006B
|k | |
U+006C
|l | |
U+006D
|m | |
U+006E
|n | |
U+006F
|o | |
U+0070
|p | |
U+0071
|q | |
U+0072
|r | |
U+0073
|s | |
U+0074
|t | |
U+0075
|u | |
U+0076
|v | |
U+0077
|w | |
U+0078
|x | |
U+0079
|y | |
U+007A
|z | |
colspan=4 | ASCII punctuation and symbols |
U+007B
|{ | |
U+007C
|{{vertical bar}} | |
U+007D
| } | |
U+007E
|~ | |
colspan=4 | Control character |
U+007F
| ␡ | Delete | DEL |
:{{note label|backslash|A|A}} The letter U+005C (\) may show up as a Yen(¥) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.{{Cite web
|title = When is a backslash not a backslash?
|work = Sorting it all Out
|author = Michael S. Kaplan
|publisher = Microsoft
|date = 2005-09-17
|url = http://blogs.msdn.com/b/michkap/archive/2005/09/17/469941.aspx
|url-status = dead
|archive-url = https://web.archive.org/web/20100612050134/http://blogs.msdn.com/b/michkap/archive/2005/09/17/469941.aspx
|archive-date = 2010-06-12
}} Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.html
Subheadings
The C0 Controls and Basic Latin block contains six subheadings.{{cite web|title=Unicode 6.2 code charts|url=https://www.unicode.org/Public/6.2.0/charts/CodeCharts.pdf|work=The Unicode Standard|accessdate=1 April 2013}}
=C0 controls=
The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.
=ASCII punctuation and symbols=
This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.
=ASCII digits=
=Uppercase Latin alphabet=
The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.
=Lowercase Latin alphabet=
The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.
=Control character=
The Control Character subheading contains the "Delete" character.
Number of symbols, letters and control codes
The table below shows the number of letters, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.
class="wikitable"
!Subheading!!Number of symbols!!Range of characters | ||
C0 controls | 32 control codes | U+0000 to U+001F |
ASCII punctuation and symbols | 33 punctuation marks and symbols | U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060 and U+007B to U+007E |
ASCII digits | 10 digits | U+0030 to U+0039 |
Uppercase Latin Alphabet | 26 unaccented Latin letters in the majuscule. | U+0041 to U+005A |
Lowercase Latin Alphabet | 26 unaccented Latin letters in the minuscule. | U+0061 to U+007A |
Control character | 1 control code containing the "Delete" character. | U+007F |
Chart
{{Unicode chart C0 Controls and Basic Latin}}
Variants
Several of the characters are defined to render as a standardized variant if followed by variant indicators.
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).{{cite web|url=https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf|title=L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set|date=2015-10-30|first1=Barbara|last1=Beeton|first2=Asmus|last2=Freytag|first3=Laurențiu|last3=Iancu|first4=Murray|last4=Sargent}}
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants.{{cite web|url=https://www.unicode.org/L2/L2011/11438-emoji-var.pdf|title=L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)|date=2011-12-22|first=Peter|last=Edberg}}{{cite web|url=https://www.unicode.org/L2/L2015/15301-emoji-sequences.pdf|title=L2/15-301: A proposal for 278 standardized variation sequences for emoji|date=2015-11-01|first=Roozbeh|last=Pournader}}{{Cite web|url=http://unicode.org/reports/tr51/|title=UTR #51: Unicode Emoji|publisher=Unicode Consortium|date=2023-09-05}}{{Cite web|url=https://unicode.org/Public/UNIDATA/emoji/emoji-data.txt|title=UCD: Emoji Data for UTR #51|publisher=Unicode Consortium|date=2023-02-01}}
They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".{{Cite web|url=https://unicode.org/Public/UNIDATA/emoji/emoji-variation-sequences.txt|title=UTS #51 Emoji Variation Sequences | publisher=The Unicode Consortium}}
class="wikitable nounderlines" style="border-collapse:collapse;background:#FFFFFF;font-size:large;text-align:center"
|+style="font-size:small" | Emoji variation sequences | ||||||||||||
style="background:#F8F8F8;font-size:small"
| style="text-align:right" | U+ | 0023 | 002A | 0030 | 0031 | 0032 | 0033 | 0034 | 0035 | 0036 | 0037 | 0038 | 0039 |
style="background:#F8F8F8;font-size:small;text-align:left" | base | # | * | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
style="background:#F8F8F8;font-size:small;text-align:left" | base+VS15+keycap | #︎⃣ | *︎⃣ | 0︎⃣ | 1︎⃣ | 2︎⃣ | 3︎⃣ | 4︎⃣ | 5︎⃣ | 6︎⃣ | 7︎⃣ | 8︎⃣ | 9︎⃣ |
style="background:#F8F8F8;font-size:small;text-align:left" | base+VS16+keycap | #️⃣ | *️⃣ | 0️⃣ | 1️⃣ | 2️⃣ | 3️⃣ | 4️⃣ | 5️⃣ | 6️⃣ | 7️⃣ | 8️⃣ | 9️⃣ |
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:
{{sticky header}}
class="wikitable sticky-header" | ||||||
Version | {{nobr|Final code points}} | Count | UTC ID | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|---|
rowspan="18" | 1.0.0 | rowspan="18" | U+0000..007F | rowspan="18" | 128 | (to be determined) | |||
{{nobr|[https://www.unicode.org/L2/L1999-UTC/u1999-013.htm UTC/1999-013]}} | {{Citation|title=Tildes and micro sign decompositions|date=1999-05-27|first=Kent|last=Karlsson}} | |||||
{{nobr|[https://www.unicode.org/L2/L1999/99176.htm L2/99-176R]}} | {{Citation|title=Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999|date=1999-11-04|first=Lisa|last=Moore|section=Micro Sign Case Mappings}} | |||||
{{nobr|[https://www.unicode.org/L2/L2004/04145-cstroke-note.pdf L2/04-145]}} | {{Citation|title=C with stroke character examples from BAE report 1884 (Dorsey)|date=2004-04-30|first=David|last=Starner}} | |||||
{{nobr|[https://www.unicode.org/L2/L2004/04202-slash-c-feedback.txt L2/04-202]}} | {{Citation|title=Slashed C Feedback|date=2004-06-07|first=Deborah|last=Anderson}} | |||||
[https://www.unicode.org/wg2/docs/n3046.pdf N3046] | {{Citation|title=Improving formal definition for control characters|date=2006-02-22|first=Michel|last=Suignard}} | |||||
{{nobr|[https://www.unicode.org/wg2/docs/n3103.pdf N3103 (pdf],}} [https://www.unicode.org/wg2/docs/n3103.doc doc]) | {{Citation|title=Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006-04-24/27|date=2006-08-25|first=V. S.|last=Umamaheswaran|section=M48.33}} | |||||
{{nobr|[https://www.unicode.org/L2/L2011/11043-modletcase.pdf L2/11-043]}} | {{Citation|title=Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters|date=2011-02-02|first1=Asmus|last1=Freytag|first2=Kent|last2=Karlsson}} | |||||
{{nobr|[https://www.unicode.org/L2/L2011/11160-pri181.pdf L2/11-160]}} | {{Citation|title=PRI #181 Changing General Category of Twelve Characters|date=2011-05-02}} | |||||
{{nobr|[https://www.unicode.org/L2/L2011/11261.htm L2/11-261R2]}} | {{Citation|title=UTC #128 / L2 #225 Minutes|date=2011-08-16|first=Lisa|last=Moore|section=Consensus 128-C3|quote=Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL.}} | |||||
{{nobr|[https://www.unicode.org/L2/L2011/11438-emoji-var.pdf L2/11-438]}} | [https://www.unicode.org/wg2/docs/n4182.pdf N4182] | {{Citation|title=Emoji Variation Sequences (Revision of L2/11-429)|date=2011-12-22|first=Peter|last=Edberg}} | ||||
{{nobr|[https://www.unicode.org/L2/L2015/15107.htm L2/15-107]}} | {{Citation|title=UTC #143 Minutes|date=2015-05-12|first=Lisa|last=Moore|section=Consensus 143-C5|quote=Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0.}} | |||||
{{nobr|[https://www.unicode.org/L2/L2015/15268-slashed-zero.pdf L2/15-268]}} | {{Citation|title=Proposal to Represent the Slashed Zero Variant of Empty Set|date=2015-10-30|first1=Barbara|last1=Beeton|first2=Asmus|last2=Freytag|first3=Laurențiu|last3=Iancu|first4=Murray|last4=Sargent}} | |||||
{{nobr|[https://www.unicode.org/L2/L2015/15301-emoji-sequences.pdf L2/15-301]}} | {{Citation|title=A proposal for 278 standardized variation sequences for emoji|date=2015-11-01|first=Roozbeh|last=Pournader}} | |||||
{{nobr|[https://www.unicode.org/L2/L2015/15254.htm L2/15-254]}} | {{Citation|title=UTC #145 Minutes|date=2015-11-16|first=Lisa|last=Moore|section=B.12.1.2 Proposal to Represent the Slashed Zero Variant of Empty Set}} | |||||
{{nobr|[https://www.unicode.org/L2/L2017/17294-fullwidth-slashed-zero.pdf L2/17-294]}} | [https://www.unicode.org/wg2/docs/n4914-17294-fullwidth-slashed-zero.pdf N4914] | {{Citation|title=Proposal to add standardized variation sequence for U+FF10 FULLWIDTH DIGIT ZERO|date=2017-08-14|first=Ken|last=Lunde|author-link=Ken Lunde}} | ||||
{{nobr|[https://www.unicode.org/L2/L2022/22019-utc170-properties-recs.pdf L2/22-019]}} | {{Citation|title=UTC #170 properties feedback & recommendations|date=2022-01-19|first1=Markus|last1=Scherer|display-authors=etal|section=F.2 F4: U+0019 in ISO vs. NameAliases.txt vs. chart/NamesList.txt}} | |||||
{{nobr|[https://www.unicode.org/L2/L2022/22016.htm L2/22-016]}} | {{Citation|title=UTC #170 Minutes|date=2022-04-21|first=Peter|last=Constable|section=Consensus 170-C24|quote=For U+0019, add a Name alias "EM" of type abbreviation, for Unicode version 15.0.}} | |||||
class="sortbottom"
| colspan="7" | {{reflist|group=lower-alpha|refs= Proposed code points and characters names may differ from final code points and names Refer to the history section of the Miscellaneous Symbols and Pictographs block for additional emoji-related documents |
See also
{{portal|Internet|Language}}
References
External links
{{Spoken Wikipedia|date=2023-11-08|En-Basic Latin (Unicode block)-article.ogg}}
{{sister project links|Unicode}}
{{Unicode navigation}}
{{authority control}}