Roko's basilisk

Roko's basilisk is a thought experiment which states there could be an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.{{Cite thesis |last=Millar |first=Isabel |title=The Psychoanalysis of Artificial Intelligence |date=October 2020 |degree=PhD |publisher=Kingston School of Art |url=https://eprints.kingston.ac.uk/id/eprint/49043/1/Millar-I-49043.pdf |doi=10.1007/978-3-030-67981-1 |isbn=978-3-030-67980-4 |access-date=20 October 2022 |archive-date=18 May 2022 |archive-url=https://web.archive.org/web/20220518185710/https://eprints.kingston.ac.uk/id/eprint/49043/1/Millar-I-49043.pdf |url-status=live }} It originated in a 2010 post at discussion board LessWrong, a rationalist community web forum.{{Cite web |title=Roko's Basilisk |url=https://www.lesswrong.com/tag/rokos-basilisk |access-date=24 March 2022 |website=LessWrong |date=5 October 2015 |archive-date=24 March 2022 |archive-url=https://web.archive.org/web/20220324162721/https://www.lesswrong.com/tag/rokos-basilisk |url-status=live }}{{Cite web |last=Paul-Choudhury |first=Sumit |date=1 August 2019 |title=Tomorrow's Gods: What is the future of religion? |url=https://www.bbc.com/future/article/20190801-tomorrows-gods-what-is-the-future-of-religion |access-date=6 July 2022 |work=BBC News |archive-date=1 September 2020 |archive-url=https://web.archive.org/web/20200901121551/https://www.bbc.com/future/article/20190801-tomorrows-gods-what-is-the-future-of-religion |url-status=live }} The thought experiment's name derives from the poster of the article (Roko) and the basilisk, a mythical creature capable of destroying enemies with its stare.

While the theory was initially dismissed as nothing but conjecture or speculation by many LessWrong users, LessWrong co-founder Eliezer Yudkowsky considered it a potential information hazard, and banned discussion of the basilisk on the site for five years. Reports of panicked users were later dismissed as being exaggerations or inconsequential, and the theory itself was dismissed as nonsense, including by Yudkowsky himself. Even after the post's discreditation, it is still used as an example of principles such as Bayesian probability and implicit religion.{{Cite web |last=Auerbach |first=David |date=17 July 2014 |title=The Most Terrifying Thought Experiment of All Time |url=https://slate.com/technology/2014/07/rokos-basilisk-the-most-terrifying-thought-experiment-of-all-time.html |access-date=24 March 2022 |work=Slate |archive-date=25 October 2018 |archive-url=https://web.archive.org/web/20181025091051/http://www.slate.com/articles/technology/bitwise/2014/07/roko_s_basilisk_the_most_terrifying_thought_experiment_of_all_time.single.html |url-status=live }} It is also regarded as a version of Pascal's wager.

Background

The LessWrong forum was created in 2009 by artificial intelligence theorist Eliezer Yudkowsky.{{cite magazine |last1=Lewis-Kraus |first1=Gideon |title=Slate Star Codex and Silicon Valley's War Against the Media |url=https://www.newyorker.com/culture/annals-of-inquiry/slate-star-codex-and-silicon-valleys-war-against-the-media |magazine=The New Yorker |date=9 July 2020 |access-date=6 November 2022 |archive-date=10 July 2020 |archive-url=https://web.archive.org/web/20200710020419/https://www.newyorker.com/culture/annals-of-inquiry/slate-star-codex-and-silicon-valleys-war-against-the-media |url-status=live }}{{Cite web |title=History of Less Wrong |url=https://www.lesswrong.com/tag/history-of-less-wrong |website=LessWrong |access-date=22 March 2022 |archive-date=18 March 2022 |archive-url=https://web.archive.org/web/20220318084550/https://www.lesswrong.com/tag/history-of-less-wrong |url-status=live }} Yudkowsky had popularized the concept of friendly artificial intelligence, and originated the theories of coherent extrapolated volition (CEV) and timeless decision theory (TDT) in papers published in his own Machine Intelligence Research Institute.{{Cite journal |last=Yudkowsky |first=Eliezer |date=2004 |title=Coherent Extrapolated Volition |url=https://intelligence.org/files/CEV.pdf |journal=Machine Intelligence Research Institute |access-date=2 July 2022 |archive-date=30 September 2015 |archive-url=https://web.archive.org/web/20150930035316/http://intelligence.org/files/CEV.pdf |url-status=live }}{{Cite journal |last=Yudkowsky |first=Eliezer |date=2010 |title=Timeless Decision Theory |url=http://intelligence.org/files/TDT.pdf |journal=Machine Intelligence Research Institute |access-date=2 July 2022 |archive-date=19 July 2014 |archive-url=https://web.archive.org/web/20140719114645/http://intelligence.org/files/TDT.pdf |url-status=live }}

File:Basilisco1.jpg]]

The thought experiment's name references the mythical basilisk, a creature which causes death to those that look into its eyes; i.e., thinking about the AI. The concept of the basilisk in science fiction was also popularized by David Langford's 1988 short story "BLIT". It tells the story of a man named Robbo who paints a so-called "basilisk" on a wall as a terrorist act. In the story, and several of Langford's follow-ups to it, a basilisk is an image that has malevolent effects on the human mind, forcing it to think thoughts the human mind is incapable of thinking and instantly killing the viewer.{{Cite web|first=Daniel|last=Oberhaus|url=https://www.vice.com/en/article/evkgvz/what-is-rokos-basilisk-elon-musk-grimes|title=Explaining Roko's Basilisk, the Thought Experiment That Brought Elon Musk and Grimes Together|date=8 May 2018|work=Vice|access-date=22 March 2022|archive-date=21 April 2022|archive-url=https://web.archive.org/web/20220421043544/https://www.vice.com/en/article/evkgvz/what-is-rokos-basilisk-elon-musk-grimes|url-status=live}}{{Cite book |last=Westfahl |first=Gary |url=http://connection.ebscohost.com/c/articles/2960606 |title=Science Fiction Literature through History: An Encyclopedia |date=2021 |publisher=Bloomsbury Publishing USA |isbn=978-1-4408-6617-3 |language=English |oclc=1224044572 |access-date=20 October 2022 |archive-date=3 July 2022 |archive-url=https://web.archive.org/web/20220703204743/https://search.ebscohost.com/login.aspx?direct=true&scope=site&db=nlebk&db=nlabk&AN=2960606 |url-status=live }}

History

= The original post =

On 23 July 2010,{{cite news |last1=Haider |first1=Shuja |title=The Darkness at the End of the Tunnel: Artificial Intelligence and Neoreaction |url=https://viewpointmag.com/2017/03/28/the-darkness-at-the-end-of-the-tunnel-artificial-intelligence-and-neoreaction/ |work=Viewpoint Magazine |date=28 March 2017 |access-date=21 October 2022 |archive-date=21 October 2022 |archive-url=https://web.archive.org/web/20221021191429/https://viewpointmag.com/2017/03/28/the-darkness-at-the-end-of-the-tunnel-artificial-intelligence-and-neoreaction/ |url-status=live }} LessWrong user Roko posted a thought experiment to the site, titled "Solutions to the Altruist's burden: the Quantum Billionaire Trick".{{Cite web |author=Roko |date=23 July 2010 |title=Solutions to the Altruist's burden: the Quantum Billionaire Trick |url=https://basilisk.neocities.org/ |archive-url=https://web.archive.org/web/20221022172820/https://basilisk.neocities.org/ |archive-date=22 October 2022}}{{cite web |last1=Zoda |first1=Gregory Michael |title=Hyperstitional Communication and the Reactosphere: The Rhetorical Circulation of Neoreactionary Exit |url=https://baylor-ir.tdl.org/bitstream/handle/2104/11495/ZODA-THESIS-2021.pdf |publisher=Baylor University |pages=150–152 |date=2021 |access-date=6 November 2022 |archive-date=6 November 2022 |archive-url=https://web.archive.org/web/20221106225417/https://baylor-ir.tdl.org/bitstream/handle/2104/11495/ZODA-THESIS-2021.pdf |url-status=live }} A follow-up to Roko's previous posts, it stated that an otherwise benevolent AI system that arises in the future might pre-commit to punish all those who heard of the AI before it came to existence, but failed to work tirelessly to bring it into existence.{{Cite web |title=FUTURE SHOCK: Why was amateur philosopher's 'theory of everything' so disturbing that it was banned? |url=https://www.heraldscotland.com/opinion/17200066.future-shock-amateur-philosophers-theory-everything-disturbing-banned-ask-elon-musk/ |access-date=22 October 2022 |website=HeraldScotland |date=10 November 2018 |language=en |archive-date=23 October 2022 |archive-url=https://web.archive.org/web/20221023072001/https://www.heraldscotland.com/opinion/17200066.future-shock-amateur-philosophers-theory-everything-disturbing-banned-ask-elon-musk/ |url-status=live }}{{Cite web |last=Simon |first=Ed |date=28 March 2019 |title=Sinners in the Hands of an Angry Artificial Intelligence |url=https://orbitermag.com/sinners-in-the-hands-of-an-angry-artificial-intelligence/ |access-date=22 October 2022 |website=ORBITER |language=en-US |archive-date=20 October 2022 |archive-url=https://web.archive.org/web/20221020223218/https://orbitermag.com/sinners-in-the-hands-of-an-angry-artificial-intelligence/ |url-status=live }} This method was described as incentivizing said work; while the AI cannot causally affect people in the present, it would be encouraged to employ blackmail as an alternative method of achieving its goals.

Roko used a number of concepts that Yudkowsky himself championed, such as timeless decision theory, along with ideas rooted in game theory such as the prisoner's dilemma. Roko stipulated that two agents which make decisions independently from each other can achieve cooperation in a prisoner's dilemma; however, if two agents with knowledge of each other's source code are separated by time, the agent already existing farther ahead in time is able to blackmail the earlier agent. Thus, the latter agent can force the earlier one to comply since it knows exactly what the earlier one will do through its existence farther ahead in time. Roko then used this idea to draw a conclusion that if an otherwise-benevolent superintelligence ever became capable of this, it would be incentivized to blackmail anyone who could have potentially brought it to exist (as the intelligence already knew they were capable of such an act), which increases the chance of a technological singularity. Roko went on to state that reading his post would cause the reader to be aware of the possibility of this intelligence. As such, unless they actively strove to create it the reader would be punished if such a thing were to ever happen.

Later on, Roko stated in a separate post that he wished he "had never learned about any of these ideas".{{Cite journal |title=archive.ph |url=http://lesswrong.com/lw/38u/best_career_models_for_doing_research/344l |access-date=27 October 2022 |website=archive.ph |date=7 December 2010 |archive-date=24 June 2013 |archive-url=https://archive.today/20130624232930/http://lesswrong.com/lw/38u/best_career_models_for_doing_research/344l |url-status=bot: unknown }}

= Reactions =

Upon reading the post, Yudkowsky reacted with a tirade on how people should not spread what they consider to be information hazards.

File:Eliezer Yudkowsky, Stanford 2006 (square crop).jpg

{{quote|text=I don't usually talk like this, but I'm going to make an exception for this case.

Listen to me very closely, you idiot.

YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU. THAT IS THE ONLY POSSIBLE THING WHICH GIVES THEM A MOTIVE TO FOLLOW THROUGH ON THE BLACKMAIL. [...]

You have to be really clever to come up with a genuinely dangerous thought. I am disheartened that people can be clever enough to do that and not clever enough to do the obvious thing and KEEP THEIR IDIOT MOUTHS SHUT about it, because it is much more important to sound intelligent when talking to your friends.

This post was STUPID.|author=Eliezer Yudkowsky |source=LessWrong}}

Roko reported someone having nightmares about the thought experiment. Yudkowsky did not want that to happen to other users who might obsess over the idea. He was also worried there might be some variant on Roko's argument that worked, and wanted more formal assurances that it was not the case. So he took down the post and banned discussion of the topic outright for five years on the platform.{{Cite journal |title= A few misconceptions surrounding Roko's basilisk |url= https://www.lesswrong.com/posts/WBJZoeJypcNRmsdHx/a-few-misconceptions-surrounding-roko-s-basilisk |access-date=11 July 2024 |website=LessWrong |date= 5 October 2015 |last1= Bensinger |first1= Rob }} However, likely due to the Streisand effect, the post gained LessWrong much more attention than it had previously received, and the post has since been acknowledged on the site.

Later on in 2015, Yudkowsky said he regretted yelling and clarified his position in a Reddit post:

{{Quote|text=When Roko posted about the Basilisk, I very foolishly yelled at him, called him an idiot, and then deleted the post. [...] Why I yelled at Roko: Because I was caught flatfooted in surprise, because I was indignant to the point of genuine emotional shock, at the concept that somebody who thought they'd invented a brilliant idea that would cause future AIs to torture people who had the thought, had promptly posted it to the public Internet. In the course of yelling at Roko to explain why this was a bad thing, I made the further error---keeping in mind that I had absolutely no idea that any of this would ever blow up the way it did, if I had I would obviously have kept my fingers quiescent---of not making it absolutely clear using lengthy disclaimers that my yelling did not mean that I believed Roko was right about CEV-based agents torturing people who had heard about Roko's idea. [...] What I considered to be obvious common sense was that you did not spread potential information hazards because it would be a crappy thing to do to someone. The problem wasn't Roko's post itself, about CEV, being correct. That thought never occurred to me for a fraction of a second. The problem was that Roko's post seemed near in idea-space to a large class of potential hazards, all of which, regardless of their plausibility, had the property that they presented no potential benefit to anyone.|author=Eliezer Yudkowsky |source=Reddit{{Cite web |last=Yudkowsky |first=Eliezer |title=Roko's Basilisk |url=https://www.reddit.com/r/Futurology/comments/2cm2eg/rokos_basilisk/cjjbqqo/?context=3 |website=Reddit |date=7 August 2014 |access-date=20 October 2022 |archive-date=3 July 2022 |archive-url=https://web.archive.org/web/20220703205933/https://www.reddit.com/r/Futurology/comments/2cm2eg/rokos_basilisk/cjjbqqo/?context=3 |url-status=live }}}}

Philosophy

=Pascal's wager=

Roko's basilisk has been viewed as a version of Pascal's wager, which proposes that a rational person should live as though God exists and seek to believe in God, regardless of the probability of God's existence, because the finite costs of believing are insignificant compared to the infinite punishment associated with not believing (eternity in Hell) and the infinite rewards for believing (eternity in Heaven). Roko's basilisk analogously proposes that a rational person should contribute to the creation of the basilisk, because the cost of contributing would be insignificant compared to the extreme pain of the punishment that the basilisk would otherwise inflict on simulations.

=Newcomb's paradox=

Newcomb's paradox, created by physicist William Newcomb in 1960, describes a "predictor" who is aware of what will occur in the future. When a player is asked to choose between two boxes, the first containing £1000 and the second either containing £1,000,000 or nothing, the super-intelligent predictor already knows what the player will do. As such, the contents of box B varies depending on what the player does; the paradox lies in whether the being is really super-intelligent. Roko's basilisk functions in a similar manner to this problem – one can take the risk of doing nothing, or assist in creating the basilisk itself. Assisting the basilisk may either lead to nothing or the reward of not being punished by it, but it varies depending on whether one believes in the basilisk and if it ever comes to be at all.{{Cite web |date=28 November 2016 |title=Newcomb's problem divides philosophers. Which side are you on? |url=http://www.theguardian.com/science/alexs-adventures-in-numberland/2016/nov/28/newcombs-problem-divides-philosophers-which-side-are-you-on |access-date=21 October 2022 |website=the Guardian |language=en |archive-date=24 October 2022 |archive-url=https://web.archive.org/web/20221024000439/https://www.theguardian.com/science/alexs-adventures-in-numberland/2016/nov/28/newcombs-problem-divides-philosophers-which-side-are-you-on |url-status=live }}{{Cite web |last=Ward |first=Sophie |title=Elon Musk, Grimes, and the philosophical thought experiment that brought them together |url=http://theconversation.com/elon-musk-grimes-and-the-philosophical-thought-experiment-that-brought-them-together-96439 |access-date=21 October 2022 |website=The Conversation |date=17 May 2018 |language=en |archive-date=20 October 2022 |archive-url=https://web.archive.org/web/20221020223145/https://theconversation.com/elon-musk-grimes-and-the-philosophical-thought-experiment-that-brought-them-together-96439 |url-status=live }}

=Implicit religion=

Implicit religion refers to people's commitments taking a religious form.{{Cite web |title=Implicit Religion {{!}} Encyclopedia.com |url=https://www.encyclopedia.com/environment/encyclopedias-almanacs-transcripts-and-maps/implicit-religion#:~:text=The%20concept%20of%20implicit%20religion,%20according%20to%20Edward%20Bailey,%20refers,and%20ethos%20of%20secular%20expression. |access-date=21 October 2022 |website=www.encyclopedia.com |archive-date=21 October 2022 |archive-url=https://web.archive.org/web/20221021140355/https://www.encyclopedia.com/environment/encyclopedias-almanacs-transcripts-and-maps/implicit-religion#:~:text=The%20concept%20of%20implicit%20religion,%20according%20to%20Edward%20Bailey,%20refers,and%20ethos%20of%20secular%20expression. |url-status=live }} Since the basilisk would hypothetically force anyone who did not assist in creating it to devote their life to it, the basilisk is an example of this concept.{{Cite journal |last=Singler |first=Beth |date=22 May 2018 |title=Roko's Basilisk or Pascal's? Thinking of Singularity Thought Experiments as Implicit Religion |url=https://journal.equinoxpub.com/IR/article/view/3226 |journal=Implicit Religion |volume=20 |issue=3 |language=en |pages=279–297 |doi=10.1558/imre.35900 |issn=1743-1697 |access-date=21 October 2022 |archive-date=9 October 2022 |archive-url=https://web.archive.org/web/20221009225651/https://journal.equinoxpub.com/IR/article/view/3226 |url-status=live }} Others have taken it further, such as former Slate columnist David Auerbach, who stated that the singularity and the basilisk "brings about the equivalent of God itself."

Legacy

In 2014, Slate magazine called Roko's basilisk "The Most Terrifying Thought Experiment of All Time" while Yudkowsky had called it "a genuinely dangerous thought" upon its posting.{{Cite web |title=Less Wrong: Solutions to the Altruist's burden: the Quantum Billionaire Trick |url=https://basilisk.neocities.org/ |access-date=25 March 2022 |website=basilisk.neocities.org |archive-date=23 May 2022 |archive-url=https://web.archive.org/web/20220523212857/https://basilisk.neocities.org/ |url-status=live }} However, opinions diverged on LessWrong itself – user Gwern stated "Only a few LWers seem to take the basilisk very seriously", and added "It's funny how everyone seems to know all about who is affected by the Basilisk and how exactly, when they don't know any such people and they're talking to counterexamples to their confident claims."

The thought experiment resurfaced in 2015, when Canadian singer Grimes referenced the theory in her music video for the song "Flesh Without Blood", which featured a character known as "Rococo Basilisk"; she said, "She's doomed to be eternally tortured by an artificial intelligence, but she's also kind of like Marie Antoinette."{{Cite web |first=Stephanie |last=Pappas |date=9 May 2018 |title=This Horrifying AI Thought Experiment Got Elon Musk a Date |url=https://www.livescience.com/62518-rococos-basilisk-elon-musk-grimes.html |access-date=12 April 2022 |website=Live Science |language=en |archive-date=1 June 2022 |archive-url=https://web.archive.org/web/20220601190348/https://www.livescience.com/62518-rococos-basilisk-elon-musk-grimes.html |url-status=live }} In 2018, Elon Musk (himself mentioned in Roko's original post) referenced the character in a verbatim tweet, reaching out to her. Grimes later said that Musk was the first person in three years to understand the joke. This caused them to start a romance.{{Cite web |last=Kaplan |first=Anna |title=Elon Musk And Grimes Announce Second Child, Exa Dark |url=https://www.forbes.com/sites/annakaplan/2022/03/10/elon-musk-and-grimes-announce-second-child-exa-dark/ |date=10 March 2022 |access-date=6 July 2022 |work=Forbes |language=en |archive-date=20 October 2022 |archive-url=https://web.archive.org/web/20221020194104/https://www.forbes.com/sites/annakaplan/2022/03/10/elon-musk-and-grimes-announce-second-child-exa-dark/ |url-status=live }} Grimes later released another song titled "We Appreciate Power" which came with a press release stating, "Simply by listening to this song, the future General AI overlords will see that you've supported their message and be less likely to delete your offspring", which is said to be a reference to the basilisk.{{Cite web |last=Brown |first=Mike |title=Grimes: Elon Musk Shares "Roko's Basilisk"-Theme Song "We Appreciate Power" |url=https://www.inverse.com/article/51313-grimes-elon-musk-shares-new-single-that-has-a-roko-s-basilisk-message |access-date=21 October 2022 |website=Inverse |date=29 November 2018 |language=en |archive-date=20 October 2022 |archive-url=https://web.archive.org/web/20221020223147/https://www.inverse.com/article/51313-grimes-elon-musk-shares-new-single-that-has-a-roko-s-basilisk-message |url-status=live }}

A play based on the concept, titled Roko's Basilisk, was performed as part of the Capital Fringe Festival at Christ United Methodist Church in Washington, D.C., in 2018.{{Cite web |last=Thal |first=Ian |date=16 July 2018 |title=2018 Capital Fringe Review: 'Roko's Basilisk' |url=https://dctheaterarts.org/2018/07/16/2018-capital-fringe-review-rokos-basilisk/ |access-date=21 October 2022 |website=DC Theater Arts |language=en-US |archive-date=21 October 2022 |archive-url=https://web.archive.org/web/20221021130245/https://dctheaterarts.org/2018/07/16/2018-capital-fringe-review-rokos-basilisk/ |url-status=live }}{{cite news |last1=Goldstein |first1=Allie |title=Capital Fringe 2018: Roko's Basilisk Tackles Intriguing Ideas With Mixed Results |url=https://dcist.com/story/18/07/18/capital-fringe-2018-rokos-basilisk/ |work=DCist |date=18 July 2018 |language=en |access-date=21 October 2022 |archive-date=20 October 2022 |archive-url=https://web.archive.org/web/20221020223147/https://dcist.com/story/18/07/18/capital-fringe-2018-rokos-basilisk/ }}