Zero-width space

{{Short description|Special character in text processing}}

The zero-width space (rendered: {{Not a typo|{{kbd|​}}}}; HTML entity: {{kbd|​}} or {{kbd|​}}), abbreviated ZWSP, is a non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the rendered text. This enables text-processing systems for scripts that do not use explicit spacing to recognize where word boundaries are for the purpose of handling line breaks appropriately.

The zero-width space is Unicode character U+200B, and is located in the Unicode General Punctuation block. In HTML, it can be represented by the character entity reference {{As written|​}}.

Purpose

The zero-width space marks a potential line break without hyphenation. Its semantics and HTML implementation are similar to the soft hyphen, but soft hyphens display a hyphen character at the point where the line is broken.

The zero-width space can be used to mark word breaks in languages without visible space between words, such as Thai, Myanmar, Khmer, and Japanese.{{cite book |title=The Unicode® Standard Version 15.0 – Core Specification |date=September 2022 |chapter=23.2 Layout Controls |publisher=The Unicode Consortium |isbn=978-1-936213-32-0 |url=https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf#page=944 |page=918}}

In justified text, the rendering engine may add inter-character spacing, also known as letter spacing, between letters separated by a zero-width space, unlike around fixed-width spaces.

= Example =

To show the effect of the zero-width space in text, the following words have been separated with zero-width spaces:

{{Zwsp|Lorem|Ipsum|Dolor|Sit|Amet|Consectetur|Adipiscing|Elit|Sed|Do|Eiusmod|Tempor|Incididunt|Ut|Labore|Et|Dolore|Magna|Aliqua|Ut|Enim|Ad|Minim|Veniam|Quis|Nostrud|Exercitation|Ullamco|Laboris|Nisi|Ut|Aliquip|Ex|Ea|Commodo|Consequat|Duis|Aute|Irure|Dolor|In|Reprehenderit|In|Voluptate|Velit|Esse|Cillum|Dolore|Eu|Fugiat|Nulla|Pariatur|Excepteur|Sint|Occaecat|Cupidatat|Non|Proident|Sunt|In|Culpa|Qui|Officia|Deserunt|Mollit|Anim|Id|Est|Laborum}}

By contrast, the following words have not been separated:

{{Not a typo|LoremIpsumDolorSitAmetConsecteturAdipiscingElitSedDoEiusmodTemporIncididuntUtLaboreEtDoloreMagnaAliquaUtEnimAdMinimVeniamQuisNostrudExercitationUllamcoLaborisNisiUtAliquipExEaCommodoConsequatDuisAuteIrureDolorInReprehenderitInVoluptateVelitEsseCillumDoloreEuFugiatNullaPariaturExcepteurSintOccaecatCupidatatNonProidentSuntInCulpaQuiOfficiaDeseruntMollitAnimIdEstLaborum}}

The first text is broken into lines but only at word boundaries, and resizing the browser window will re-break the text accordingly, while the second text is not broken at all.

Usage

= HTML =

In HTML pages, the HTML element {{Tag|wbr|o}} functions as a zero-width space. In Internet Explorer 6, the zero-width space was not supported in some fonts.{{cite web|url=http://dunae.ca/2009/better-web-typography-with-spaces-and-hyphens/ |first=Alex|last=Dunae |title=Better Web Typography with Spaces and Hyphens |work=dunae.ca |access-date=December 3, 2009 |archive-url=https://web.archive.org/web/20101214223741/http://dunae.ca/2009/better-web-typography-with-spaces-and-hyphens/ |archive-date=December 14, 2010}}

= Unspecific use =

The zero-width space should not be used to prevent automatic conversion of certain character combinations into emojis, because it marks a line break opportunity.{{refn|group=note|Due to the zero-width space marking a line break opportunity, when the zero-width space is used to prevent an ASCII equivalent from being parsed as such and converted to the corresponding emoji, the line could be broken between constituents of the sequence. E.g. when a zero-width space is added between the colon and the parenthesis of {{code|:)}}, the {{code|:}} could end up at the line end, and the {{code|)}} could end up at the line start. That makes the zero-width space unfit for this use, no matter how legitimate this use is otherwise.}} To prevent systems from converting sequences like {{code|:)}} into emoji like ☺ or 🙂, the zero-width non-joiner or any other non-breaking non-displayed character should be used.{{cite web |url=https://stackoverflow.com/questions/55033436/how-to-display-the-fraction-15-16-nicely-in-unicode |title=How to display the fraction 15/16 nicely in Unicode? |accessdate=2024-09-07}}{{refn|group=note|This reference is not directly related to the subject matter, but it describes a similar use case, and the recommendation is transposable, as in one case, the purpose is to prevent a sequence of digits from being parsed as a single numerator rather than as a whole part and a numerator, and in the case at hand here, the purpose is to prevent an ASCII sequence from being parsed as an emoji. The reference states that "Any zero-width, invisible character will do the trick." But obviously this character needs to be no-break, and the quotation ends: "something like U+200C ZERO WIDTH NON-JOINER or U+2060 WORD JOINER will also work." None of these two is breaking, and it is not stated that U+200B ZERO WIDTH SPACE would also work. Because yes, it does work, but additionally it marks a line break opportunity, and that is obviously not part of the intended behavior. There is really no research needed to notice this. What we need to do is to make sense of what we read.}}

= Prohibition in domain names =

ICANN rules prohibit domain names from containing non-displayed characters, including the zero-width space, and most browsers prohibit their use within domain names because they can be used to create a homograph attack, where a malicious URL is visually indistinguishable from a legitimate one.{{cite web |title=Network.IDN.blacklist_chars |work=mozillaZine |url=http://kb.mozillazine.org/Network.IDN.blacklist_chars |access-date=2018-02-07}}{{cite web |title=Unicode Character 'Zero Width Space' |work=FileFormat.Info |url=https://www.fileformat.info/info/unicode/char/200b/index.htm |access-date=2018-02-07}}

Encoding

The zero-width space character is encoded in Unicode as {{unichar|200B|ZERO WIDTH SPACE}}.{{cite web| title=General Punctuation – Unicode| url=https://www.unicode.org/charts/PDF/U2000.pdf| access-date=2013-07-20}}

In HTML, it can be referenced as {{Not a typo|​}}, {{Not a typo|​}} or {{Not a typo|​}}. Additionally, the character entities ​, ​, ​, and ​ all also refer to the zero-width space, contrary to what their names suggest.[https://zvon.org/comp/r/ref-MathML_2.html#Entities~ZeroWidthSpace Entities/ZeroWidthSpace] in MathML Version 2.0

In HTML mailto: {{clarify span|tags|Tags? There is mailto URL scheme – did someone mean this?|date=March 2025}}, %E2%80%8B renders a zero-width space (but may interfere with correctly copying the email link).{{citation needed|date=May 2025}}

The TeX representation is {{Not a typo|\hskip0pt}}; the LaTeX representation is \hspace{0pt};{{cite web| title=The LaTeX Companion. Chapter 3: Basic Formatting Tools| url=https://www.latex-project.org/help/books/bookpart_tlc2-ch3.pdf| access-date=2019-07-16}} and the groff representation is \:.{{cite web| title=groff(7) – Linux manual page| url=http://man7.org/linux/man-pages/man7/groff.7.html| access-date=2014-02-08}}

See also

References

= Note =

{{reflist|group=note}}

= Citations =

{{reflist}}

= Sources =

{{refbegin}}

  • {{citation | first1 = Victor H. | last1 = Mair | authorlink1 = Victor H. Mair | first2 = Yongquan | last2 = Liu | title = Characters and computers | publisher = IOS Press | date = 1991 }}

{{refend}}

{{-}}

{{Unicode navigation}}

Category:Control characters

Category:Typography

Category:Unicode formatting code points

Category:Whitespace