Base32

{{short description|Binary-to-text encoding scheme using 32 symbols}}

Base32 is an encoding method based on the base-32 numeral system. It uses an alphabet of 32 digits, each of which represents a different combination of 5 bits (2⁵). Since base32 is not very widely adopted, the question of notation—which characters to use to represent the 32 digits—is not as settled as in the case of more well-known numeral systems (such as hexadecimal), though RFCs and unofficial and de-facto standards exist. One way to represent Base32 numbers in human-readable form is using digits 0–9 followed by the twenty-two upper-case letters A–V. However, many other variations are used in different contexts. Historically, Baudot code could be considered a modified (stateful) base32 code. Base32 is often used to represent byte strings.

RFC 4648 encodings

The October 2006 proposed Internet standard{{cite web | url=https://www.rfc-editor.org/standards | title=Official Internet Protocol Standards » RFC Editor }} {{IETF RFC|4648}} documents base16, base32 and base64 encodings. It includes two schemes for base32, but recommends one over the other. It further recommends that regardless of precedent, only the alphabet it defines in its section 6 actually be called base32, and that the other similar alphabet in its section 7 instead be called base32hex.{{efn|For context, the proposed standard also documents two base64 encodings, and here too expresses a preference for one, though for different reasons. Only one base16 encoding is documented – long universally adopted even prior to the publication of RFC 4648 or its predecessor RFC 3548.}} Agreement with those recommendations is not universal. Care needs to be taken when using systems that are called base32, as those systems could be base32 per RFC 4648 §6, or per §7 (possibly disregarding that RFC's deprecation of the simpler name for the latter), or they could be yet another encoding variant, see further below.

= Base 32 Encoding per §6 =

The most widely used{{cn|date=December 2023}} base32 alphabet is defined in RFC [https://datatracker.ietf.org/doc/html/rfc4648#section-6 4648 §6] and the earlier {{IETF RFC|3548}} (2003). The scheme was originally designed in 2000 by John Myers for SASL/GSSAPI.{{Cite IETF |draft=draft-ietf-cat-sasl-gssapi-01 |last=Myers |first=J. |date=May 23, 2000 |title=SASL GSSAPI mechanisms |access-date=2023-06-24 }} It uses an alphabet of A–Z, followed by 2–7. The digits 0, 1 and 8 are skipped due to their similarity with the letters O, I and B (thus "2" has a decimal value of 26).

In some circumstances padding is not required or used (the padding can be inferred from the length of the string modulo 8). RFC 4648 states that padding must be used unless the specification of the standard (referring to the RFC) explicitly states otherwise. Excluding padding is useful when using Base32 encoded data in URL tokens or file names where the padding character could pose a problem.

class="wikitable" style="width:40ex; text-align: center; margin: 0 auto 0 auto;" \|+align="top"\|The RFC 4648 Base32 alphabet !width="12%"\|Value !width="12%"\|Symbol \| rowspan="10" \| !width="12%"\|Value !width="12%"\|Symbol \| rowspan="10" \| !width="12%"\|Value !width="12%"\|Symbol \| rowspan="10" \| !width="12%"\|Value !width="12%"\|Symbol
0	A	8	I	16	Q	24	Y
1	B	9	J	17	R	25	Z
2	C	10	K	18	S	26	2
3	D	11	L	19	T	27	3
4	E	12	M	20	U	28	4
5	F	13	N	21	V	29	5
6	G	14	O	22	W	30	6
7	H	15	P	23	X	31	7
colspan="2" \| \| colspan="2" \| \| colspan="2" \| \|padding \|=

This is an example of a Base32 representation using the previously described 32-character set (IPFS CIDv1 in Base32 upper-case encoding): {{Code|code=BAFYBEICZSSCDSBS7FFQZ55ASQDF3SMV6KLCW3GOFSZVWLYARCI47BGF354}}

= Base 32 Encoding with Extended Hex Alphabet per §7<span class="anchor" id="base32hex"></span> =

"Extended hex" base 32 or base32hex, another scheme for base 32 per RFC [https://datatracker.ietf.org/doc/html/rfc4648#section-7 4648 §7], extends hexadecimal in a more natural way: Its lower half is identical with hexadecimal, and beyond that, base32hex simply continues the alphabet through to the letter V.

This scheme was first proposed by Christian Lanctot, a programmer working at Sage software, in a letter to Dr. Dobb's magazine in March 1999{{cite news | url=http://www.drdobbs.com/letters/184410894 | title=A Better Date? (second letter under that heading) - Letters | work=Dr Dobb's | date=1999-03-01 | author=Lanctot, Christian}} as part of a suggested solution for the Y2K bug. Lanctot referred to it as "Double Hex". The same alphabet was described in 2000 in {{IETF RFC|2938}} under the name "Base-32". RFC 4648, while acknowledging existing use of this version in NSEC3, refers to it as base32hex and discourages referring to it as only "base32".

Since this notation uses digits 0–9 followed by consecutive letters of the alphabet, it matches the digits used by the JavaScript parseInt() function{{cite web | url=https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/parseInt | title=parseInt() - JavaScript | publisher=Mozilla | work=MDN Web Docs| date=29 December 2023 }} and the Python int() constructor{{cite web | url=https://docs.python.org/3/library/functions.html#int | title=Built-in Functions | publisher=Python Software Foundation | work=Python documentation | access-date=2017-08-09 | archive-url=https://web.archive.org/web/20181026035007/https://docs.python.org/3/library/functions.html#int | archive-date=2018-10-26 | url-status=dead }} when a base larger than 10 (such as 16 or 32) is specified. It also retains hexadecimal's property of preserving bitwise sort order of the represented data, unlike RFC 4648's §6 base32, or base64.{{cite journal |url=https://tools.ietf.org/html/rfc4648#ref-7 |title=7. Base 32 Encoding with Extended Hex Alphabet |journal=RFC 4648: The Base16, Base32, and Base64 Data Encodings |publisher=IETF | year=2006 | author=Josefsson, Simon|doi=10.17487/RFC4648 |doi-access=free }}

Unlike many other base 32 notation systems, base32hex digits beyond 9 are contiguous. However, its set of digits includes characters that may visually conflict. With the right font it is possible to visually distinguish between 0, O and 1, I, but other fonts may be unsuitable, as those letters could be hard for humans to tell apart, especially when the context English usually provides is not present in a notation system that is only expressing numbers.{{efn|The similarity used to be a feature, not a bug, because it allowed early typewriters to omit extra keys for the numbers 0 and 1, thus reducing mechanical complexity. When computers were introduced, it was felt desirable for early computer printers to be able to produce the same type as quality typewriters, hence typewriter-like fonts kept these letters looking alike. Many years on, it is no longer necessary to use fonts that don't clearly distinguish some letters, but the tradition persists. It is also not just typewriter-style fonts that have similar problems – many influential fonts do, e.g. Helvetica.}} The choice of font is not controlled by notation or encoding, yet base32hex makes no attempt to compensate for the shortcomings of affected fonts.{{efn|The design of many base32 variants is driven by the view that it is risky to assume a distinguishable font will be used. On the other hand, the logic of a scheme not trying to compensate for quirks outside its remit may be more straightforward.}}

class="wikitable" style="width:40ex; text-align: center; margin: 0 auto 0 auto;" \|+align="top"\|The "Extended Hex" Base 32 Alphabet !width="12%"\|Value !width="12%"\|Symbol \| rowspan="10" \| !width="12%"\|Value !width="12%"\|Symbol \| rowspan="10" \| !width="12%"\|Value !width="12%"\|Symbol \| rowspan="10" \| !width="12%"\|Value !width="12%"\|Symbol
0	0	8	8	16	G	24	O
1	1	9	9	17	H	25	P
2	2	10	A	18	I	26	Q
3	3	11	B	19	J	27	R
4	4	12	C	20	K	28	S
5	5	13	D	21	L	29	T
6	6	14	E	22	M	30	U
7	7	15	F	23	N	31	V
colspan="2" \| \| colspan="2" \| \| colspan="2" \| \|padding \|=

Alternative encoding schemes

Changing the Base32 alphabet, all alternative standards have similar combinations of alphanumeric symbols.

= z-base-32 =

z-base-32{{cite web |url=http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt |title=Human-oriented base-32 encoding |last=O'Whielacronx |first=Zooko |author-link=Zooko Wilcox-O'Hearn |date=2009}} is a Base32 encoding designed by Zooko Wilcox-O'Hearn to be easier for human use and more compact. It includes 1, 8 and 9 but excludes l, v, 0 and 2. It also permutes the alphabet so that the easier characters are the ones that occur more frequently.{{clarification needed|reason=Claim is senseless without a presumed integer value distribution; perhaps something like Ziff's law was presumed in the code's design, but there's no point echoing this here unsupported, as it only confuses matters.|date=October 2023}} It compactly encodes bitstrings whose length in bits is not a multiple of 8{{clarification needed|date=October 2023|reason=Upgraded by different editor from "how" flag, basically extending my argument about frequency efficiency at my newly added flag to this one as well}} and omits trailing padding characters. z-base-32 was used in the Mnet open source project, and is currently used in Phil Zimmermann's ZRTP protocol, and in the Tahoe-LAFS open source project.

class="wikitable" style="width:40ex; text-align: center; margin: 0 auto 0 auto;" \|+align="top"\|z-base-32 alphabet !width="12%"\|Value !width="12%"\|Symbol \|rowspan=9\| !width="12%"\|Value !width="12%"\|Symbol \|rowspan=9\| !width="12%"\|Value !width="12%"\|Symbol \|rowspan=9\| !width="12%"\|Value !width="12%"\|Symbol
0	y	8	e	16	o	24	a
1	b	9	j	17	t	25	3
2	n	10	k	18	1	26	4
3	d	11	m	19	u	27	5
4	r	12	c	20	w	28	h
5	f	13	p	21	i	29	7
6	g	14	q	22	s	30	6
7	8	15	x	23	z	31	9

= Crockford's Base32 =

Another alternative design for Base32 is created by Douglas Crockford, who proposes using additional characters for a mod-37 checksum.{{cite web |author1=Douglas Crockford |title=Base 32 |url=http://www.crockford.com/wrmg/base32.html |archive-url=https://web.archive.org/web/20021223012947/http://www.crockford.com/wrmg/base32.html |archive-date=2002-12-23}} It excludes the letters I, L, and O to avoid confusion with digits. It also excludes the letter U to reduce the likelihood of accidental obscenity.

Libraries to encode binary data in Crockford's Base32 are available in a variety of languages.

class="wikitable" style="width:80ex; text-align: center; margin: 0 auto 0 auto;" \|+ Crockford's Base32 alphabet !width="16%"\|Value !width="16%"\|Encode Digit !width="16%"\|Decode Digit \|rowspan=17\| !width="16%"\|Value !width="16%"\|Encode Digit !width="16%"\|Decode Digit
0	0	0 o O	16	G	g G
1	1	1 i I l L	17	H	h H
2	2	2	18	J	j J
3	3	3	19	K	k K
4	4	4	20	M	m M
5	5	5	21	N	n N
6	6	6	22	P	p P
7	7	7	23	Q	q Q
8	8	8	24	R	r R
9	9	9	25	S	s S
10	A	a A	26	T	t T
11	B	b B	27	V	v V
12	C	c C	28	W	w W
13	D	d D	29	X	x X
14	E	e E	30	Y	y Y
15	F	f F	31	Z	z Z

= Electrologica =

An earlier form of base 32 notation was used by programmers working on the Electrologica X1 to represent machine addresses. The "digits" were represented as decimal numbers from 0 to 31. For example, 12-16 would represent the machine address 400 (= 12 × 32 + 16).

= Geohash =

See Geohash algorithm, used to represent latitude and longitude values in one (bit-interlaced) positive integer.{{Cite web|url=http://geohash.org/site/tips.html|title=Tips & Tricks - geohash.org|website=geohash.org|access-date=2020-04-03}} The base32 representation of Geohash uses all decimal digits (0–9) and almost all of the lower case alphabet, except letters "a", "i", "l", "o", as shown by the following character map:

class="wikitable" style="text-align:center"
Decimal \| 0 \|\| 1 \|\| 2 \|\| 3 \|\| 4 \|\| 5 \|\| 6 \|\| 7 \|\| 8 \|\| 9 \|\| 10 \|\| 11 \|\| 12 \|\| 13 \|\| 14 \|\| 15
Base 32 \| 0 \|\| 1 \|\| 2 \|\| 3 \|\| 4 \|\| 5 \|\| 6 \|\| 7 \|\| 8 \|\| 9 \|\| b \|\| c \|\| d \|\| e \|\| f \|\| g
style="font: 0.5em/0.5em serif;" colspan="20" \|
Decimal \| 16 \|\| 17 \|\| 18 \|\| 19 \|\| 20 \|\| 21 \|\| 22 \|\| 23 \|\| 24 \|\| 25 \|\| 26 \|\| 27 \|\| 28 \|\| 29 \|\| 30 \|\| 31
Base 32 \| h \|\| j \|\| k \|\| m \|\| n \|\| p \|\| q \|\| r \|\| s \|\| t \|\| u \|\| v \|\| w \|\| x \|\| y \|\| z

class="wikitable" style="text-align:center"

Decimal

| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15

Base 32

| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || b || c || d || e || f || g

style="font: 0.5em/0.5em serif;" colspan="20" |

Decimal

| 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31

Base 32

| h || j || k || m || n || p || q || r || s || t || u || v || w || x || y || z

= Turing's encoding =

In approximately 1950,{{cite web |title=Alan M. Turing (1912 - 1954) |url=https://curation.cs.manchester.ac.uk/computer50/www.computer50.org/mark1/turing.html |website=Computer 50 |publisher=The University of Manchester |access-date=17 April 2025}} Alan Turing wrote software requirements for the Manchester Mark I computing system.

{{cite web |title=Alan M. Turing (1912 - 1954) |url=https://curation.cs.manchester.ac.uk/digital60/www.digital60.org/about/biographies/alanturing/index-2.html#mark1 |website=Digital 60 |publisher=The University of Manchester |access-date=17 April 2025}}

A transcription of Turing's [https://web.archive.org/web/20110607044851/http://www.computer50.org/kgill/mark1/RobertTau/turing.pdf manual] for the Mark I is available on archive.org.{{cite web |author1=Alan M. Turing, transcribed by Robert S. Thau |title=Alan Turing’s Manual for the Ferranti Mk. I |url=http://www.computer50.org/kgill/mark1/RobertTau/turing.pdf |website=Computer 50 |publisher=The University of Manchester |access-date=17 April 2025 |archive-url=https://web.archive.org/web/20110607044851/http://www.computer50.org/kgill/mark1/RobertTau/turing.pdf |archive-date=7 June 2011 |format=PDF |date=13 February 2000}}

The University of Manchester's archive site commemorating 60 years of computing

{{cite web |title=Programming on the Ferranti Mark 1 |url=https://curation.cs.manchester.ac.uk/digital60/www.digital60.org/birth/manchestercomputers/mark1/program.html#table32 |website=Digital 60 |publisher=The University of Manchester |access-date=17 April 2025}} has a [https://curation.cs.manchester.ac.uk/digital60/www.digital60.org/birth/manchestercomputers/mark1/program.html#table32 table] of the base 32 encoding that Turing used. The table and the accompanying explanation also appear in the manual.

Another account of this period in Turing's life appears on his biography page under Early computers and the Turing test.

= Video games =

Before NVRAM became universal, several video games for Nintendo platforms used base 31 numbers for passwords.

These systems omit vowels (except Y) to prevent the game from accidentally giving a profane password.

Thus, the characters are generally some minor variation of the following set: 0–9, B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Y, Z, and some punctuation marks.

Games known to use such a system include Mario Is Missing!, Mario's Time Machine, Tetris Blast, and The Lord of the Rings (Super NES).

= Word-safe alphabet =

The word-safe Base32 alphabet is an extension of the Open Location Code Base20 alphabet. That alphabet uses 8 numeric digits and 12 case-sensitive letter digits chosen to avoid accidentally forming words. Treating the alphabet as case-sensitive produces a 32 (8+12+12) digit set.

class="wikitable" style="text-align:center"
Decimal \| 0 \|\| 1 \|\| 2 \|\| 3 \|\| 4 \|\| 5 \|\| 6 \|\| 7 \|\| 8 \|\| 9 \|\| 10 \|\| 11 \|\| 12 \|\| 13 \|\| 14 \|\| 15
Base 32 \| 2 \|\| 3 \|\| 4 \|\| 5 \|\| 6 \|\| 7 \|\| 8 \|\| 9 \|\| C \|\| F \|\| G \|\| H \|\| J \|\| M \|\| P \|\| Q
style="font: 0.5em/0.5em serif;" colspan="20" \|
Decimal \| 16 \|\| 17 \|\| 18 \|\| 19 \|\| 20 \|\| 21 \|\| 22 \|\| 23 \|\| 24 \|\| 25 \|\| 26 \|\| 27 \|\| 28 \|\| 29 \|\| 30 \|\| 31
Base 32 \| R \|\| V \|\| W \|\| X \|\| c \|\| f \|\| g \|\| h \|\| j \|\| m \|\| p \|\| q \|\| r \|\| v \|\| w \|\| x

class="wikitable" style="text-align:center"

Decimal

| 0 || 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 || 11 || 12 || 13 || 14 || 15

Base 32

| 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || C || F || G || H || J || M || P || Q

style="font: 0.5em/0.5em serif;" colspan="20" |

Decimal

| 16 || 17 || 18 || 19 || 20 || 21 || 22 || 23 || 24 || 25 || 26 || 27 || 28 || 29 || 30 || 31

Base 32

| R || V || W || X || c || f || g || h || j || m || p || q || r || v || w || x

Comparisons with other systems

= Advantages =

Base32 has a number of advantages over Base64:

The resulting character set is all one case, which can often be beneficial when using a case-insensitive filesystem, DNS names, spoken language, or human memory.
The result can be used as a file name because it cannot possibly contain the '/' symbol, which is the Unix path separator.
The alphabet can be selected to avoid similar-looking pairs of different symbols, so the strings can be accurately transcribed by hand. (For example, the {{IETF RFC|4648}} §6 symbol set omits the digits for one, eight and zero, since they could be confused with the letters 'I', 'B', and 'O'.)
A result excluding padding can be included in a URL without encoding any characters.

Base32 has advantages over hexadecimal/Base16:

Base32 representation takes 20% less space. (1000 bits takes 200 characters, compared with 250 for Base16.)

Compared with 8-bit-based encodings, 5-bit systems might also have advantages when used for character transmission:

Featuring the complete alphabet, the RFC 4648 §6 Base32 scheme and similar allow encoding two more characters per 32-bit integer (for a total of 6 instead of 4, with 2 bits to spare), saving bandwidth in constrained domains such as radiomeshes.

= Disadvantages =

Base32 representation takes roughly 20% more space than Base64. Also, because it encodes five 8-bit bytes (40 bits) to eight 5-bit base32 characters rather than three 8-bit bytes (24 bits) to four 6-bit base64 characters, padding to an 8-character boundary is a greater burden on short messages (which may be a reason to elide padding, which is an option in {{IETF RFC|4648}}).

class="wikitable" style="width:40ex; text-align: center; margin: 0 auto 0 auto;"

|+align="tobottomp"|Length of notations as percentage of binary data

!width="33%"|

!width="33%"|Base64

!width="33%"|Base32

!width="33%"|Hexadecimal

8-bit

133%

160%

200%

7-bit

117%

140%

175%

Even if Base32 takes roughly 20% less space than hexadecimal, Base32 is much less used. Hexadecimal can easily be mapped to bytes because two hexadecimal digits is a byte. Base32 does not map to individual bytes. However, two Base32 digits correspond to ten bits, which can encode (32 × 32 =) 1,024 values, with obvious applications for orders of magnitude of multiple-byte units in terms of powers of 1,024.

Hexadecimal is easier to learn and remember, since that only entails memorising the numerical values of six additional symbols (A–F), and even if those are not instantly recalled, it is easier to count through just over a handful of values.

Software implementations<span class="anchor" id="Software"></span>

Base32 programs are suitable for encoding arbitrary byte data using a restricted set of symbols that can both be conveniently used by humans and processed by computers.

Base32 implementations use a symbol set made up of at least 32 different characters (sometimes a 33rd for padding), as well as an algorithm for encoding arbitrary sequences of 8-bit bytes into a Base32 alphabet. Because more than one 5-bit Base32 character is needed to represent each 8-bit input byte, if the input is not a multiple of 5 bytes (40 bits), then it doesn't fit exactly in 5-bit Base32 characters. In that case, some specifications require padding characters to be added while some require extra zero bits to make a multiple of 5 bits. The closely related Base64 system, in contrast, uses a set of 64 symbols (or 65 symbols when padding is used).

Base32 implementations in C/C++,{{Cite web|url=https://sourceforge.net/projects/cyoencode/|title=CyoEncode|website=SourceForge|date=24 June 2023 }}{{Cite web|url=https://www.gnu.org/software/gnulib/gnulib.html|title=Gnulib - GNU Portability Library - GNU Project - Free Software Foundation|website=www.gnu.org}} Perl,{{cite web|url=https://metacpan.org/release/MIME-Base32|title=MIME-Base32 - Base32 encoder and decoder|access-date=2018-07-29|website=MetaCPAN}} Java,{{Cite web|url=https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base32.html|title=Base32 (Apache Commons Codec 1.15 API)|website=commons.apache.org}} JavaScript{{Cite web|url=https://www.npmjs.com/package/base32|title=base32|website=npm|date=27 September 2022 }} Python,{{Cite web|url=https://docs.python.org/3/library/base64.html|title=base64 — Base16, Base32, Base64, Base85 Data Encodings|website=Python documentation}} Go{{Cite web|url=https://golang.org/pkg/encoding/base32|title = Base32 package - encoding/Base32 - PKG.go.dev}} and Ruby{{Cite web|url=https://rubygems.org/gems/base32|title=base32 | RubyGems.org | your community gem host|website=rubygems.org}} are available.

{{Cite web|url=https://beautifycode.net/string-hex-converter|title=String To Hex Converter|website=Beautify Code}}

Notes

References

{{IETF RFC|4648}}

Category:Binary-to-text encoding formats

Category:Power-of-two numeral systems

Base32

RFC 4648 encodings

= Base 32 Encoding per §6 =

= Base 32 Encoding with Extended Hex Alphabet per §7<span class="anchor" id="base32hex"></span> =

Alternative encoding schemes

= z-base-32 =

= Crockford's Base32 =

= Electrologica =

= Geohash =

= Turing's encoding =

= Video games =

= Word-safe alphabet =

Comparisons with other systems

= Advantages =

= Disadvantages =

Software implementations<span class="anchor" id="Software"></span>

See also

Notes

References