AIXI
{{Short description|Mathematical formalism for artificial general intelligence}}
{{independent sources|date=September 2023}}
AIXI {{IPAc-en|'|ai|k|s|i}} is a theoretical mathematical formalism for artificial general intelligence.
It combines Solomonoff induction with sequential decision theory.
AIXI was first proposed by Marcus Hutter in 2000{{cite book |author=Marcus Hutter |title=A Theory of Universal Artificial Intelligence based on Algorithmic Complexity |url=https://archive.org/details/arxiv-cs0004001 |arxiv=cs.AI/0004001 |year=2000 |bibcode=2000cs........4001H }} and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence.{{cite book |author=Marcus Hutter |title=Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability |series=Texts in Theoretical Computer Science an EATCS Series |url=https://books.google.com/books?id=NP53iZGt4KUC |year=2005 |publisher=Springer |isbn=978-3-540-22139-5 |doi=10.1007/b138233 |s2cid=33352850 |author-mask=1}}
AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.
Etymology
According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations.{{Cite web |last=Hutter |first=Marcus |title=Universal Artificial Intelligence |url=http://www.hutter1.net/ai/uaibook.htm |access-date=2024-09-21 |website=www.hutter1.net |language=en}}
Definition
AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment . The interaction proceeds in time steps, from to , where is the lifespan of the AIXI agent. At time step t, the agent chooses an action (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" , which consists of an "observation" (e.g., a camera image) and a reward , distributed according to the conditional probability , where is the "history" of actions, observations and rewards. The environment is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that is computable, that is, the observations and rewards received by the agent from the environment can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent.{{cite arXiv |last1=Veness |first1=Joel |author2=Kee Siong Ng |last3=Hutter |first3=Marcus |last4=Uther |first4=William |last5=Silver |first5=David |eprint=0909.0801 |title=A Monte Carlo AIXI Approximation |year=2009 |class=cs.AI}}
The only goal of the AIXI agent is to maximize , that is, the sum of rewards from time step 1 to m.
The AIXI agent is associated with a stochastic policy , which is the function it uses to choose actions at every time step, where is the space of all possible actions that AIXI can take and is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) can also be thought of as a stochastic policy (which is a function): , where the is the Kleene star operation.
In general, at time step (which ranges from 1 to m), AIXI, having previously executed actions (which is often abbreviated in the literature as
:
a_t := \arg \max_{a_t} \sum_{o_t r_t} \ldots \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)}
or, using parentheses, to disambiguate the precedences
:
a_t := \arg \max_{a_t} \left( \sum_{o_t r_t} \ldots \left( \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \left( \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} \right) \right) \right)
Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to
Let us break this definition down in order to attempt to fully understand it.
Let us now put all these components together in order to understand this equation or definition.
At time step t, AIXI chooses the action
= Parameters =
The parameters to AIXI are the universal Turing machine U and the agent's lifetime m, which need to be chosen. The latter parameter can be removed by the use of discounting.
Optimality
AIXI's performance is measured by the expected total number of rewards it receives.
AIXI has been proven to be optimal in the following ways.
- Pareto optimality: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment.{{citation needed|date=June 2014}}
- Balanced Pareto optimality: like Pareto optimality, but considering a weighted sum of environments.
- Self-optimizing: a policy p is called self-optimizing for an environment
\mu if the performance of p approaches the theoretical maximum for\mu when the length of the agent's lifetime (not time) goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing.
It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI.{{cite conference|conference=Proceedings of the 28th Conference on Learning Theory|last1=Leike|first1=Jan|last2=Hutter|first2=Marcus|title=Bad Universal Priors and Notions of Optimality|date=2015|url=http://proceedings.mlr.press/v40/Leike15.pdf}}
However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable.{{cite web|last1=Soares|first1=Nate|title=Formalizing Two Problems of Realistic World-Models|url=https://intelligence.org/files/RealisticWorldModels.pdf|website=Intelligence.org|access-date=2015-07-19|ref=MIRI}}
Computational aspects
Like Solomonoff induction, AIXI is incomputable. However, there are computable approximations of it. One such approximation is AIXItl, which performs at least as well as the provably best time t and space l limited agent. Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for Monte Carlo AIXI FAC-Context-Tree Weighting), which has had some success playing simple games such as partially observable Pac-Man.{{cite arXiv |last1=Veness |first1=Joel |author2=Kee Siong Ng |last3=Hutter |first3=Marcus |last4=Uther |first4=William |last5=Silver |first5=David |eprint=0909.0801 |title=A Monte Carlo AIXI Approximation |year=2009 |class=cs.AI}}[https://www.youtube.com/watch?v=yfsMHtmGDKE Playing Pacman using AIXI Approximation – YouTube]
See also
References
{{reflist}}
- "Universal Algorithmic Intelligence: A mathematical top->down approach", Marcus Hutter, {{arXiv|cs/0701125}}; also in Artificial General Intelligence, eds. B. Goertzel and C. Pennachin, Springer, 2007, {{ISBN|9783540237334}}, pp. 227–290, {{doi|10.1007/978-3-540-68677-4_8}}.