junction tree algorithm

File:Junction-tree-example.gif

The junction tree algorithm (also known as 'Clique Tree') is a method used in machine learning to extract marginalization in general graphs. In essence, it entails performing belief propagation on a modified graph called a junction tree. The graph is called a tree because it branches into different sections of data; nodes of variables are the branches.{{Cite web|url=https://ai.stanford.edu/~paskin/gm-short-course/lec3.pdf|title=A Short Course on Graphical Models|last=Paskin|first=Mark|website=Stanford}} The basic premise is to eliminate cycles by clustering them into single nodes. Multiple extensive classes of queries can be compiled at the same time into larger structures of data. There are different algorithms to meet specific needs and for what needs to be calculated. Inference algorithms gather new developments in the data and calculate it based on the new information provided.{{Cite web|url=http://www.dfki.de/~neumann/publications/diss/node58.html|title=The Inference Algorithm|website=www.dfki.de|access-date=2018-10-25}}

Junction tree algorithm

=Hugin algorithm=

If the graph is directed then moralize it to make it un-directed.
Introduce the evidence.
Triangulate the graph to make it chordal.
Construct a junction tree from the triangulated graph (we will call the vertices of the junction tree "supernodes").
Propagate the probabilities along the junction tree (via belief propagation)

Note that this last step is inefficient for graphs of large treewidth. Computing the messages to pass between supernodes involves doing exact marginalization over the variables in both supernodes. Performing this algorithm for a graph with treewidth k will thus have at least one computation which takes time exponential in k. It is a message passing algorithm.{{Cite web|url=http://www.gatsby.ucl.ac.uk/teaching/courses/ml1-2007/lect5-handout.pdf|title=Recap on Graphical Models}} The Hugin algorithm takes fewer computations to find a solution compared to Shafer-Shenoy.

=Shafer-Shenoy algorithm=

Computed recursively
Multiple recursions of the Shafer-Shenoy algorithm results in Hugin algorithm{{Cite web|url=https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-438-algorithms-for-inference-fall-2014/lecture-notes/MIT6_438F14_Lec14.pdf|title=Algorithms|date=2014|website=Massachusetts Institute of Technology}}
Found by the message passing equation
Separator potentials are not stored{{Cite web|url=https://cs.nyu.edu/~roweis/csc412-2004/notes/lec20x.pdf|title=Hugin Inference Algorithm|last=Roweis|first=Sam|date=2004|website=NYU}}

The Shafer-Shenoy algorithm is the sum product of a junction tree.{{Cite web|url=https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-438-algorithms-for-inference-fall-2014/recitations/MIT6_438F14_rec8.pdf|title=Algorithms for Inference|date=2014|website=Massachusetts Institute of Technology}} It is used because it runs programs and queries more efficiently than the Hugin algorithm. The algorithm makes calculations for conditionals for belief functions possible.{{cite arXiv|last=Kłopotek|first=Mieczysław A.|date=2018-06-06|title=Dempsterian-Shaferian Belief Network From Data|eprint=1806.02373|class=cs.AI}} Joint distributions are needed to make local computations happen.

=Underlying theory=

File:Hmm temporal bayesian net.svg

The first step concerns only Bayesian networks, and is a procedure to turn a directed graph into an undirected one. We do this because it allows for the universal applicability of the algorithm, regardless of direction.

The second step is setting variables to their observed value. This is usually needed when we want to calculate conditional probabilities, so we fix the value of the random variables we condition on. Those variables are also said to be clamped to their particular value.

File:Chordal-graph.svg

The third step is to ensure that graphs are made chordal if they aren't already chordal. This is the first essential step of the algorithm. It makes use of the following theorem:{{cite web |url=https://people.eecs.berkeley.edu/~wainwrig/Talks/Wainwright_PartI.pdf |title= Graphical models, message-passing algorithms, and variational methods: Part I|last= Wainwright|first=Martin |date= 31 March 2008|website=Berkeley EECS |access-date=16 November 2016}}

Theorem: For an undirected graph, G, the following properties are equivalent:

Graph G is triangulated.
The clique graph of G has a junction tree.
There is an elimination ordering for G that does not lead to any added edges.

Thus, by triangulating a graph, we make sure that the corresponding junction tree exists. A usual way to do this, is to decide an elimination order for its nodes, and then run the Variable elimination algorithm. The variable elimination algorithm states that the algorithm must be run each time there is a different query. This will result to adding more edges to the initial graph, in such a way that the output will be a chordal graph.

All chordal graphs have a junction tree. The next step is to construct the junction tree. To do so, we use the graph from the previous step, and form its corresponding clique graph.{{cite web |url=http://mathworld.wolfram.com/CliqueGraph.html |title=Clique Graph|access-date=16 November 2016}} Now the next theorem gives us a way to find a junction tree:

Theorem: Given a triangulated graph, weight the edges of the clique graph by their cardinality, |A∩B|, of the intersection of the adjacent cliques A and B. Then any maximum-weight spanning tree of the clique graph is a junction tree.

So, to construct a junction tree we just have to extract a maximum weight spanning tree out of the clique graph. This can be efficiently done by, for example, modifying Kruskal's algorithm.

The last step is to apply belief propagation to the obtained junction tree.{{cite web |url=https://www.cs.helsinki.fi/u/bmmalone/probabilistic-models-spring-2014/JunctionTreeBarber.pdf |title= Probabilistic Modelling and Reasoning, The Junction Tree Algorithm |last= Barber|first=David |date= 28 January 2014|website= University of Helsinki |access-date=16 November 2016}}

Usage: A junction tree graph is used to visualize the probabilities of the problem. The tree can become a binary tree to form the actual building of the tree.{{cite conference

| last1 = Ramirez | first1 = Julio C.

| last2 = Munoz | first2 = Guillermina

| last3 = Gutierrez | first3 = Ludivina

| contribution = Fault Diagnosis in an Industrial Process Using Bayesian Networks: Application of the Junction Tree Algorithm

| date = September 2009

| doi = 10.1109/cerma.2009.28

| publisher = IEEE

| title = 2009 Electronics, Robotics and Automotive Mechanics Conference (CERMA)| pages = 301–306

| isbn = 978-0-7695-3799-3

}} A specific use could be found in auto encoders, which combine the graph and a passing network on a large scale automatically.{{Cite journal|last=Jin|first=Wengong|date=Feb 2018|title=Junction Tree Variational Autoencoder for Molecular Graph Generation|journal=Cornell University|arxiv=1802.04364|bibcode=2018arXiv180204364J}}

= Inference Algorithms =

File:Cutset-4.svg

Loopy belief propagation: A different method of interpreting complex graphs. The loopy belief propagation is used when an approximate solution is needed instead of the exact solution.{{Cite book|title=CERMA 2009 : proceedings : 2009 Electronics, Robotics and Automotive Mechanics Conference : 22-25 September 2009 : Cuernavaca, Morelos, Mexico|date=2009|publisher=IEEE Computer Society|others=Institute of Electrical and Electronics Engineers.|isbn=9780769537993|location=Los Alamitos, Calif.|oclc=613519385}} It is an approximate inference.

Cutset conditioning: Used with smaller sets of variables. Cutset conditioning allows for simpler graphs that are easier to read but are not exact.

junction tree algorithm