Optical Character Recognition (Unicode block)
{{Infobox Unicode block
|blockname = Optical Character Recognition
|rangestart = 2440
|rangeend = 245F
|script1 = Common
|symbols = OCR controls
|sources = ISO 2033
|1_0_0 = 11
|note = {{cite web|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|accessdate=2023-07-26}}{{cite web|url=https://www.unicode.org/versions/enumeratedversions.html|title=Enumerated Versions of The Unicode Standard|work=The Unicode Standard|accessdate=2023-07-26}}
}}
Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards.
Block
{{Unicode chart Optical Character Recognition}}
Subheadings
The Optical Character Recognition block has three informal subheadings (groupings) within its character collection: OCR-A, MICR, and OCR.{{cite web|url=https://www.unicode.org/charts/PDF/U2440.pdf|title=Unicode Code Charts: Optical Character Recognition|work=The Unicode Standard, Version 6.3|accessdate=27 February 2014}}
=OCR-A=
{{further|OCR-A}}
File:Verrechnungsscheck, WestLB, Landeshauptkasse Düsseldorf, 2004.jpg, showing use of ⑂, ⑀ and ⑁ in the machine-readable line]]
The OCR-A subheading contains six characters taken from the OCR-A font described in the ISO 1073-1:1976 standard: {{unichar|2440|OCR HOOK}}, {{unichar|2441|OCR CHAIR}}, {{unichar|2442|OCR FORK}}, {{unichar|2443|OCR INVERTED FORK}}, {{unichar|2444|OCR BELT BUCKLE}}, and {{unichar|2445|OCR BOW TIE}}. The OCR bow tie is given the informative alias "unique asterisk".
The hook, chair and fork, in addition to a long vertical bar, are included in the most basic "numeric" implementation level of OCR-A, which includes digits but excludes letters and conventional punctuation.{{cite web |url=https://ecma-international.org/wp-content/uploads/ECMA-8_2nd_edition_january_1977.pdf |title=Nominal Character Dimensions of the Numeric OCR-A Font |edition=2nd |id=ECMA-8 |year=1977 |author=European Computer Manufacturers Association |author-link=Ecma International}} By contrast, the most basic implementation level of OCR-B instead includes the digits, plus sign, less-than sign, greater-than sign, long vertical bar and seven of the capital letters;{{cite web |url=https://www.open-std.org/JTC1/SC2/WG3/docs/n470.pdf#page=12 |page=8 |title=9.1: Subset 1: Minimal alphanumeric subset |work=Proposal for Type 3 Technical Report, TR 15907, Information technology—Revision of OCR-B standard (ISO 1073-2:1976) |id=ISO/IEC JTC1/SC2/WG3 N470 |date=1998-09-28 |author=ISO/IEC JTC1/SC2/WG3 |author-link=ISO/IEC JTC 1/SC 2}} as such, there are no characters specific to OCR-B in the Optical Character Recognition block.
=MICR=
{{further|Magnetic ink character recognition}}
File:NIXON, Richard M (signed check).jpg, showing use of ⑆, ⑇, ⑈ and ⑉ in the machine-readable line]]
The MICR subheading contains four punctuation characters for bank cheque identifiers, taken from the magnetic ink character recognition E-13B font (codified in the ISO 1004:1995 standard): {{unichar|2446|OCR BRANCH BANK IDENTIFICATION}}, {{unichar|2447|OCR AMOUNT OF CHECK}}, {{unichar|2448|OCR DASH}}, and {{unichar|2449|OCR CUSTOMER ACCOUNT NUMBER}}.
The latter two characters are misnamed: their names were inadvertently switched when they were named in the 1993 (first) edition of ISO/IEC 10646,{{citation|mode=cs1 |url=https://www.unicode.org/wg2/docs/n4103.pdf |page=29 |section=T.3. Optical Character Recognition |title=Unconfirmed minutes of WG 2 meeting 58 |author=ISO/IEC JTC 1/SC 2/WG 2 |author-link=ISO/IEC JTC 1/SC 2 |date=2012-01-03 |id=SC2 N4188 / WG2 N4103 |quotation=These Magnetic Ink Character Recognition (MICR) symbols are used by banks on checks. The names of these characters were inadvertently mixed up in the 1993 edition of ISO/IEC 10646.}} a mistake which had been present since Unicode 1.0.0.{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=version 1.0 |title=3.8: Block-by-Block Charts |publisher=Unicode Consortium}} Although their formal names remain unchanged due to the Unicode stability policy, they both have corrected normative aliases: U+2448 ⑈ is {{sc|MICR ON US SYMBOL}}, and U+2449 ⑉ is {{sc|MICR DASH SYMBOL}}{{citation|mode=cs1 |url=https://www.unicode.org/notes/tn27/tn27-4.html |title=Known Anomalies in Unicode Character Names |publisher=Unicode Consortium |id=Unicode Technical Note #27 |first1=Asmus |last1=Freytag |first2=Rick |last2=McGowan |first3=Ken |last3=Whistler |date=2017-04-10 |edition=4}} (the standard notes that "the Unicode character names include several misnomers").
These symbols had previously been encoded by the ISO-IR-98 encoding defined by ISO 2033:1983, in which they were simply named {{sc|SYMBOL ONE}} through {{sc|SYMBOL FOUR}}.{{cite iso-ir |number=98 |title=E13B Graphic Character Set |id-in-title=yes |sponsor=ISO/TC97/SC2 |sponsor-link=ISO/IEC JTC 1/SC 2#History |date=1985-08-01}} All four characters have informative aliases in the Unicode charts: "transit", "amount", "on us", and "dash" respectively.
=OCR=
{{further|JIS X 9008}}
The OCR subheading consists of a single character: {{unichar|244A|OCR DOUBLE BACKSLASH}}.
History
The following Unicode-related documents record the purpose and process of defining specific characters in the Optical Character Recognition block:
{{sticky header}}
class="wikitable sticky-header" | |||||
Version | {{nobr|Final code points}} | Count | L2 ID | WG2 ID | Document |
---|---|---|---|---|---|
rowspan="4" | 1.0.0 | rowspan="4" | U+2440..244A | rowspan="4" | 11 | (to be determined) | ||
{{nobr|[https://www.unicode.org/L2/L2010/10416.htm L2/10-416R]}} | {{Citation|title=UTC #125 / L2 #222 Minutes|date=2010-11-09|first=Lisa|last=Moore|ref=none|section=Consensus 125-C39|quote=Create two formal aliases, U+2448 MICR ON US SYMBOL and U+2449 MICR DASH SYMBOL for Unicode 6.1.}} | ||||
[https://www.unicode.org/wg2/docs/n4103.pdf N4103] | {{Citation|title=Unconfirmed minutes of WG 2 meeting 58|date=2012-01-03|ref=none|section=T.3. Optical Character Recognition}} | ||||
{{nobr|[https://www.unicode.org/L2/L2022/22065-edcom-rept-utc171.html L2/22-065]}} | {{Citation|title=Editorial Committee Report and Recommendations for UTC #171Meeting|date=2022-04-13|first=Ken|last=Whistler|ref=none|section=Opt Subject: Unicode 14.0 "Optical Character Recognition" code chart [Affects U+2447]}} | ||||
class="sortbottom"
| colspan="6" | {{reflist|group=lower-alpha|refs=Proposed code points and characters names may differ from final code points and names}} |
References
{{reflist}}