Bush hid the facts
{{short description|Bug in Microsoft Windows}}
{{About|the software bug|events occurring during the presidency of George W. Bush|Presidency of George W. Bush}}
{{Multiple issues|
{{Self-published|date=July 2023}}
{{More sources needed|date=March 2024}}
}}
"Bush hid the facts" is a common name for a bug present in Microsoft Windows which causes text encoded in ASCII to be interpreted as if it were UTF-16LE, resulting in garbled text. When the string "Bush hid the facts", without quotes, was put in a Notepad document and saved, closed, and reopened, the nonsensical sequence of the Chinese characters "{{lang|zh|{{linktext|畂|桳|栠|摩|琠|敨|映|捡|獴}}}}" would appear instead.
While "Bush hid the facts" is the sentence most commonly presented to induce the error, the bug can also be triggered by other strings such as {{nowrap|"hhhh hhh hhh hhhhh"}},{{cite web|url=http://www.hoax-slayer.com/bush-hid-the-facts-notepad.html|title= Bush Hid The Facts - Notepad Conspiracy Claim|website=Hoax Slayer|first=Brett M. |last=Christensen|date=November 2, 2009|url-status=dead|archive-url=https://web.archive.org/web/20100315222317/http://www.hoax-slayer.com/bush-hid-the-facts-notepad.html|archive-date=2010-03-15}} {{Nowrap|"this app can break"}},{{Cite web |last=Kaplan |first=Michael S. |date=14 June 2006 |title=Behind 'How to break Windows Notepad' |url=http://blogs.msdn.com/b/michkap/archive/2006/06/14/631016.aspx |url-status=dead |archive-url=https://web.archive.org/web/20131025131556/http://blogs.msdn.com/b/michkap/archive/2006/06/14/631016.aspx |archive-date=25 October 2013 |access-date=2022-07-12 |website=blogs.msdn.com}} and even {{nowrap|"a "}} or {{nowrap|"z!"}}.{{Citation |title="Bush hid the facts" Bug EXPLAINED | date=4 July 2023 |url=https://www.youtube.com/watch?v=sPShnuBSvBg |access-date=2024-09-04 |language=en}}
Cause
File:Windows "Bush hid the facts" bug explained.svg
When a text file is opened in Notepad, Windows checks if the text is encoded in UTF-16 using the Win32 charset detection function {{tt|IsTextUnicode}}. {{tt|IsTextUnicode}} guesses it is Unicode if the total changes to the "low byte" (the even indexes starting at 0) is three times greater than the total changes to the "hi byte" (the odd indexes). If so, it
returns {{tt|true}}, causing the application to incorrectly interpret the text as UTF-16LE.{{cite web |last=Chen |first=Raymond |date=March 24, 2004 |title=Some files come up strange in Notepad |url=https://devblogs.microsoft.com/oldnewthing/20040324-00/?p=40093 |access-date=2022-07-12 |website=The Old New Thing |publisher=Microsoft}} As a result, Notepad renders the text as Chinese characters. It is commonly believed that spaces at even indexes trigger the bug, this is due to space (32) being farther away from the lower-case letters (97...122) than letters are from each other.
The bug had existed since {{tt|IsTextUnicode}} was introduced with {{nowrap|Windows NT 3.5}} in 1994, but was not discovered until early 2004.{{cite web | url = http://weblogs.asp.net/cumpsd/archive/2004/02/27/81098.aspx | title = Notepad bug? Encoding issue? | first = David | last = Cumps | date = February 27, 2004 | work =
Workarounds
{{Unreferenced section|date=March 2024}}
Several workarounds exist for this bug:
- Add a character so the string is an odd number of bytes long.
- Save the file as "UTF-8" (before 2018) or "UTF-8 with BOM" (after 2018) rather than "ANSI". This prepends a UTF-8 byte order mark which avoids the bug.{{cn|reason=According to ref it is possible to make a string starting with BOM that will fail|date=March 2024}} UTF-8 without the byte order mark would still trigger the bug, as it is identical to the "ANSI" file.
- Saving as "Unicode", which in Microsoft Windows means UTF-16LE. When loading this text {{tt|IsTextUnicode}} should (and does) return {{tt|true}} and the text is correct.
- To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select "ANSI" or "UTF-8" in the "Encoding" list box, and click Open. Under Windows 2000, Notepad lacks the "Encoding" list box. WordPad appears to load the text correctly without choosing the encoding, since it uses its own encoding detection.
References
{{Reflist}}
External links
- [https://devblogs.microsoft.com/oldnewthing/20070417-00/?p=27223 The Notepad file encoding problem, redux] – Raymond Chen
- [https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode IsTextUnicode] – Microsoft Docs