Trojan Source

{{short description|Software vulnerability in source code}}

{{Infobox bug

|name=Trojan Source

|CVE={{unbulleted list| {{CVE|2021-42574}}|{{CVE|2021-42694}} }}

|discovered={{Start date and age|2021|09|09|df=no}}

|discoverer=Nicholas Boucher, Ross Anderson

|affected software=Unicode, source code

|website={{URL|https://trojansource.codes}}

}}

Trojan Source is a software vulnerability that abuses Unicode's bidirectional characters to display source code differently than the actual execution of the source code.{{Cite web |title='Trojan Source' Bug Threatens the Security of All Code – Krebs on Security |date=November 2021 |url=https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/ |url-status=live |archive-url=https://web.archive.org/web/20220114074926/https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/ |archive-date=2022-01-14 |access-date=2022-01-17 |language=en-US}} The exploit utilizes how writing scripts of different reading directions are displayed and encoded on computers. It was discovered by Nicholas Boucher and Ross Anderson at Cambridge University in late 2021.{{Cite web |title=VU#999008 - Compilers permit Unicode control and homoglyph characters |url=https://kb.cert.org/vuls/id/999008 |url-status=live |archive-url=https://web.archive.org/web/20220121052409/https://www.kb.cert.org/vuls/ |archive-date=2022-01-21 |access-date=2022-01-17 |website=www.kb.cert.org}}

Background

{{Main article|Unicode|Bidirectional text}}

Unicode is an encoding standard for representing text, symbols, and glyphs. Unicode is the most dominant encoding on computers, used in over 98% of websites {{as of|September 2023|lc=y}}.{{Cite web |title=Usage Survey of Character Encodings broken down by Ranking |url=https://w3techs.com/technologies/cross/character_encoding/ranking |url-status=live |archive-url=https://web.archive.org/web/20220121052401/https://w3techs.com/technologies/cross/character_encoding/ranking |archive-date=2022-01-21 |access-date=2022-01-17 |website=w3techs.com}} It supports many languages, and because of this, it must support different methods of writing text. This requires support for both left-to-right languages, such as English and Russian, and right-to-left languages, such as Hebrew and Arabic. Since Unicode aims to enable using more than one writing system, it must be able to mix scripts with different display orders and resolve conflicting orders. As a solution, Unicode contains characters called bidirectional characters (Bidi) that describe how text is displayed and represented. These characters can be abused to change how text is interpreted without changing it visually, as the characters are often invisible.{{Cite web |title=UAX #9: Unicode Bidirectional Algorithm |url=https://www.unicode.org/reports/tr9/ |url-status=live |archive-url=https://web.archive.org/web/20190502143500/http://www.unicode.org/reports/tr9/ |archive-date=2019-05-02 |access-date=2022-01-17 |website=www.unicode.org}}

class="wikitable mw-collapsible"

|+Relevant Unicode bidirectional formatting characters

!Abbreviation

!Name

!Description

LRE

|{{unichar|202A|LEFT-TO-RIGHT EMBEDDING}}

|Try treating following text as left-to-right.

RLE

|{{unichar|202B|RIGHT-TO-LEFT EMBEDDING}}

|Try treating following text as right-to-left.

LRO

|{{unichar|202D|LEFT-TO-RIGHT OVERRIDE}}

|Force treating following text as left-to-right.

RLO

|{{unichar|202E|RIGHT-TO-LEFT OVERRIDE}}

|Force treating following text as right-to-left.

LRI

|{{unichar|2066|LEFT-TO-RIGHT ISOLATE}}

|Force treating following text as left-to-right without affecting adjacent text.

RLI

|{{unichar|2067|RIGHT-TO-LEFT ISOLATE}}

|Force treating following text as right-to-left without affecting adjacent text.

FSI

|{{unichar|2068|FIRST STRONG ISOLATE}}

|Force treating following text in direction indicated by the next character.

PDF

|{{unichar|202C|POP DIRECTIONAL FORMATTING}}

|Terminate nearest LRE, RLE, LRO, or RLO.

PDI

|{{unichar|2069|POP DIRECTIONAL ISOLATE}}

|Terminate nearest LRI or RLI.

Methodology

In the exploit, bidirectional characters are abused to visually reorder text in source code so that later execution occurs in a different order.

Bidirectional characters can be inserted in areas of source code where string literals are allowed. This often applies to documentation, variables, or comments.

class="wikitable"

|+Vulnerable Python code

!Source code with hints

!Source code displayed visually

!Source code interpreted

nowrap|

def sum(num1, num2):

Add num1 and num2, and [RLI] ;return

return num1 + num2

|nowrap|

def sum(num1, num2):

Add num1 and num2, and return;

return num1 + num2

|nowrap|

def sum(num1, num2):

Add num1 and num2, and ;

return

return num1 + num2

In the above example, the RLI mark (right-to-left isolate) forces the following text to be interpreted differently than it is displayed: the triple-quote is first (ending the string), followed by a semicolon (starting a new line), and finally with the premature return (returning {{mono|None}} and ignoring any code below it). The new line terminates the RLI mark, preventing it from flowing into the below code. Because of the Bidi character, some source code editors and IDEs rearrange the code for display without any visual indication that the code has been rearranged, so a human code reviewer would not normally detect them. However, when the code is inserted into a compiler, the compiler may ignore the Bidi character and process the characters in a different order than visually displayed. When the compiler is finished, it could potentially execute code that visually appeared to be non-executable.{{Cite web |last=Edge|first=Jake|date=2021-11-03|title=Trojan Source: tricks (no treats) with Unicode [LWN.net] |url=https://lwn.net/Articles/874951/ |access-date=2022-03-12 |website=lwn.net}} Formatting marks can be combined multiple times to create complex attacks.{{Cite web |last=Stockley |first=Mark |date=2021-11-03 |title=Trojan Source: Hiding malicious code in plain sight |url=https://blog.malwarebytes.com/exploits-and-vulnerabilities/2021/11/trojan-source-hiding-malicious-code-in-plain-sight/ |access-date=2022-03-12 |website=Malwarebytes Labs |language=en-US}}

Impact and mitigation

Programming languages that support Unicode strings and follow Unicode's Bidi algorithm are vulnerable to the exploit. This includes languages like Java, Go, C, C++, C#, Python, and JavaScript.{{Cite web |last=Tung |first=Liam |title=Programming languages: This sneaky trick could allow attackers to hide 'invisible' vulnerabilities in code |url=https://www.zdnet.com/article/this-sneaky-trick-could-allow-attackers-to-hide-invisible-vulnerabilities-in-code/ |url-status=live |archive-url=https://web.archive.org/web/20211221093531/https://www.zdnet.com/article/this-sneaky-trick-could-allow-attackers-to-hide-invisible-vulnerabilities-in-code/ |archive-date=2021-12-21 |access-date=2022-01-21 |website=ZDNet |language=en}}

While the attack is not strictly an error, many compilers, interpreters, and websites added warnings or mitigations for the exploit. Both GNU GCC and LLVM received requests to deal with the exploit.{{Cite web |title=GCC & LLVM Patches Pending To Fend Off Trojan Source Attacks |url=https://www.phoronix.com/scan.php?page=news_item&px=GCC-LLVM-Trojan-Source |url-status=live |archive-url=https://web.archive.org/web/20211201090848/https://www.phoronix.com/scan.php?page=news_item&px=GCC-LLVM-Trojan-Source |archive-date=2021-12-01 |access-date=2022-01-17 |website=www.phoronix.com |language=en}} Marek Polacek submitted a patch to GCC shortly after the exploit was published that implemented a warning for potentially unsafe directional characters; this functionality was merged for GCC 12 under the -Wbidi-chars flag.{{Cite web |last=Malcolm |first=David |date=2022-01-12 |title=Prevent Trojan Source attacks with GCC 12 |url=https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-attacks-gcc-12 |url-status=live |archive-url=https://web.archive.org/web/20220117090828/https://developers.redhat.com/articles/2022/01/12/prevent-trojan-source-attacks-gcc-12 |archive-date=2022-01-17 |access-date=2022-01-17 |website=Red Hat Developer |language=en}}{{Cite web |title=Warning Options (Using the GNU Compiler Collection (GCC)) |url=https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wbidi-chars_003d |url-status=live |archive-url=https://web.archive.org/web/20181205035654/http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wbidi-chars_003d |archive-date=2018-12-05 |access-date=2022-01-17 |website=gcc.gnu.org}} LLVM also merged similar patches.

Rust fixed the exploit in 1.56.1, rejecting code that includes the characters by default. The developers of Rust found no vulnerable packages prior to the fix.{{Cite web |title=Security advisory for rustc (CVE-2021-42574) {{!}} Rust Blog |url=https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html |url-status=live |archive-url=https://web.archive.org/web/20211130021819/https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html |archive-date=2021-11-30 |access-date=2022-01-21 |website=blog.rust-lang.org |language=en}}

Red Hat issued an advisory on their website, labeling the exploit as "moderate".{{Cite web |title=RHSB-2021-007 Trojan source attacks (CVE-2021-42574,CVE-2021-42694) |url=https://access.redhat.com/security/vulnerabilities/RHSB-2021-007 |url-status=live |archive-url=https://web.archive.org/web/20220117090914/https://access.redhat.com/security/vulnerabilities/RHSB-2021-007 |archive-date=2022-01-17 |access-date=2022-01-21 |website=Red Hat Customer Portal |language=en}} GitHub released a warning on their blog, as well as updating the website to show a dialog box when Bidi characters are detected in a repository's code.{{Cite web |title=Warning about bidirectional Unicode text {{!}} GitHub Changelog |url=https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/ |url-status=live |archive-url=https://web.archive.org/web/20220115005709/https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/ |archive-date=2022-01-15 |access-date=2022-01-21 |website=The GitHub Blog |date=31 October 2021 |language=en-US}}

References

{{reflist}}