Optical Character Recognition (Unicode block)

{{Infobox Unicode block

|blockname = Optical Character Recognition

|rangestart = 2440

|rangeend = 245F

|script1 = Common

|symbols = OCR controls

|sources = ISO 2033

|1_0_0 = 11

|note = {{cite web|url=https://www.unicode.org/ucd/|title=Unicode character database|work=The Unicode Standard|accessdate=2023-07-26}}{{cite web|url=https://www.unicode.org/versions/enumeratedversions.html|title=Enumerated Versions of The Unicode Standard|work=The Unicode Standard|accessdate=2023-07-26}}

}}

Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards.

Block

{{Unicode chart Optical Character Recognition}}

Subheadings

The Optical Character Recognition block has three informal subheadings (groupings) within its character collection: OCR-A, MICR, and OCR.{{cite web|url=https://www.unicode.org/charts/PDF/U2440.pdf|title=Unicode Code Charts: Optical Character Recognition|work=The Unicode Standard, Version 6.3|accessdate=27 February 2014}}

=OCR-A=

{{further|OCR-A}}

File:Verrechnungsscheck, WestLB, Landeshauptkasse Düsseldorf, 2004.jpg, showing use of ⑂, ⑀ and ⑁ in the machine-readable line]]

The OCR-A subheading contains six characters taken from the OCR-A font described in the ISO 1073-1:1976 standard: {{unichar|2440|OCR HOOK}}, {{unichar|2441|OCR CHAIR}}, {{unichar|2442|OCR FORK}}, {{unichar|2443|OCR INVERTED FORK}}, {{unichar|2444|OCR BELT BUCKLE}}, and {{unichar|2445|OCR BOW TIE}}. The OCR bow tie is given the informative alias "unique asterisk".

The hook, chair and fork, in addition to a long vertical bar, are included in the most basic "numeric" implementation level of OCR-A, which includes digits but excludes letters and conventional punctuation.{{cite web |url=https://ecma-international.org/wp-content/uploads/ECMA-8_2nd_edition_january_1977.pdf |title=Nominal Character Dimensions of the Numeric OCR-A Font |edition=2nd |id=ECMA-8 |year=1977 |author=European Computer Manufacturers Association |author-link=Ecma International}} By contrast, the most basic implementation level of OCR-B instead includes the digits, plus sign, less-than sign, greater-than sign, long vertical bar and seven of the capital letters;{{cite web |url=https://www.open-std.org/JTC1/SC2/WG3/docs/n470.pdf#page=12 |page=8 |title=9.1: Subset 1: Minimal alphanumeric subset |work=Proposal for Type 3 Technical Report, TR 15907, Information technology—Revision of OCR-B standard (ISO 1073-2:1976) |id=ISO/IEC JTC1/SC2/WG3 N470 |date=1998-09-28 |author=ISO/IEC JTC1/SC2/WG3 |author-link=ISO/IEC JTC 1/SC 2}} as such, there are no characters specific to OCR-B in the Optical Character Recognition block.

=MICR=

{{further|Magnetic ink character recognition}}

File:NIXON, Richard M (signed check).jpg, showing use of ⑆, ⑇, ⑈ and ⑉ in the machine-readable line]]

The MICR subheading contains four punctuation characters for bank cheque identifiers, taken from the magnetic ink character recognition E-13B font (codified in the ISO 1004:1995 standard): {{unichar|2446|OCR BRANCH BANK IDENTIFICATION}}, {{unichar|2447|OCR AMOUNT OF CHECK}}, {{unichar|2448|OCR DASH}}, and {{unichar|2449|OCR CUSTOMER ACCOUNT NUMBER}}.

The latter two characters are misnamed: their names were inadvertently switched when they were named in the 1993 (first) edition of ISO/IEC 10646,{{citation|mode=cs1 |url=https://www.unicode.org/wg2/docs/n4103.pdf |page=29 |section=T.3. Optical Character Recognition |title=Unconfirmed minutes of WG 2 meeting 58 |author=ISO/IEC JTC 1/SC 2/WG 2 |author-link=ISO/IEC JTC 1/SC 2 |date=2012-01-03 |id=SC2 N4188 / WG2 N4103 |quotation=These Magnetic Ink Character Recognition (MICR) symbols are used by banks on checks. The names of these characters were inadvertently mixed up in the 1993 edition of ISO/IEC 10646.}} a mistake which had been present since Unicode 1.0.0.{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=version 1.0 |title=3.8: Block-by-Block Charts |publisher=Unicode Consortium}} Although their formal names remain unchanged due to the Unicode stability policy, they both have corrected normative aliases: U+2448 ⑈ is {{sc|MICR ON US SYMBOL}}, and U+2449 ⑉ is {{sc|MICR DASH SYMBOL}}{{citation|mode=cs1 |url=https://www.unicode.org/notes/tn27/tn27-4.html |title=Known Anomalies in Unicode Character Names |publisher=Unicode Consortium |id=Unicode Technical Note #27 |first1=Asmus |last1=Freytag |first2=Rick |last2=McGowan |first3=Ken |last3=Whistler |date=2017-04-10 |edition=4}} (the standard notes that "the Unicode character names include several misnomers").

These symbols had previously been encoded by the ISO-IR-98 encoding defined by ISO 2033:1983, in which they were simply named {{sc|SYMBOL ONE}} through {{sc|SYMBOL FOUR}}.{{cite iso-ir |number=98 |title=E13B Graphic Character Set |id-in-title=yes |sponsor=ISO/TC97/SC2 |sponsor-link=ISO/IEC JTC 1/SC 2#History |date=1985-08-01}} All four characters have informative aliases in the Unicode charts: "transit", "amount", "on us", and "dash" respectively.

=OCR=

{{further|JIS X 9008}}

The OCR subheading consists of a single character: {{unichar|244A|OCR DOUBLE BACKSLASH}}.

History

The following Unicode-related documents record the purpose and process of defining specific characters in the Optical Character Recognition block:

{{sticky header}}

class="wikitable sticky-header"
Version{{nobr|Final code points}}CountL2 IDWG2 IDDocument
rowspan="4" | 1.0.0rowspan="4" | U+2440..244Arowspan="4" | 11(to be determined)
{{nobr|[https://www.unicode.org/L2/L2010/10416.htm L2/10-416R]}}{{Citation|title=UTC #125 / L2 #222 Minutes|date=2010-11-09|first=Lisa|last=Moore|ref=none|section=Consensus 125-C39|quote=Create two formal aliases, U+2448 MICR ON US SYMBOL and U+2449 MICR DASH SYMBOL for Unicode 6.1.}}
[https://www.unicode.org/wg2/docs/n4103.pdf N4103]{{Citation|title=Unconfirmed minutes of WG 2 meeting 58|date=2012-01-03|ref=none|section=T.3. Optical Character Recognition}}
{{nobr|[https://www.unicode.org/L2/L2022/22065-edcom-rept-utc171.html L2/22-065]}}{{Citation|title=Editorial Committee Report and Recommendations for UTC #171Meeting|date=2022-04-13|first=Ken|last=Whistler|ref=none|section=Opt Subject: Unicode 14.0 "Optical Character Recognition" code chart [Affects U+2447]}}
class="sortbottom"

| colspan="6" | {{reflist|group=lower-alpha|refs=Proposed code points and characters names may differ from final code points and names}}

References