Smooth maximum
{{Short description|Mathematical approximation}}
In mathematics, a smooth maximum of an indexed family x1, ..., xn of numbers is a smooth approximation to the maximum function meaning a parametric family of functions such that for every {{mvar|α}}, the function {{tmath|m_\alpha}} is smooth, and the family converges to the maximum function {{tmath|m_\alpha \to \max}} as {{tmath|\alpha\to\infty}}. The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, {{tmath|m_\alpha \to \max}} as {{tmath|\alpha \to \infty}} and {{tmath|m_\alpha \to \min}} as {{tmath|\alpha \to -\infty}}. The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.
Examples
= Boltzmann operator =
For large positive values of the parameter , the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.
:
\mathcal{S}_\alpha (x_1,\ldots,x_n) = \frac{\sum_{i=1}^n x_i e^{\alpha x_i}}{\sum_{i=1}^n e^{\alpha x_i}}
has the following properties:
- as
- is the arithmetic mean of its inputs
- as
The gradient of is closely related to softmax and is given by
:
\nabla_{x_i}\mathcal{S}_\alpha (x_1,\ldots,x_n) = \frac{e^{\alpha x_i}}{\sum_{j=1}^n e^{\alpha x_j}} [1 + \alpha(x_i - \mathcal{S}_\alpha (x_1,\ldots,x_n))].
This makes the softmax function useful for optimization techniques that use gradient descent.
This operator is sometimes called the Boltzmann operator,{{cite journal |last1=Asadi |first1=Kavosh |last2=Littman |first2=Michael L. |author-link2=Michael L. Littman |date=2017 |title=An Alternative Softmax Operator for Reinforcement Learning |url=https://proceedings.mlr.press/v70/asadi17a.html |journal=PMLR |volume=70 |pages=243–252 |arxiv=1612.05628 |access-date=January 6, 2023}} after the Boltzmann distribution.
= LogSumExp =
{{main|LogSumExp}}
Another smooth maximum is LogSumExp:
:
This can also be normalized if the are all non-negative, yielding a function with domain and range :
:
The term corrects for the fact that by canceling out all but one zero exponential, and if all are zero.
= Mellowmax =
The mellowmax operator is defined as follows:
:
It is a non-expansive operator. As , it acts like a maximum. As , it acts like an arithmetic mean. As , it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.{{cite journal |last1=Safak |first1=Aysel |date=February 1993 |title=Statistical analysis of the power sum of multiple correlated log-normal components |url=https://ieeexplore.ieee.org/document/192387 |journal=IEEE Transactions on Vehicular Technology |volume=42 |issue=1 |pages={58–61 |doi=10.1109/25.192387 |access-date=January 6, 2023}}
= p-Norm =
{{main|P-norm}}
Another smooth maximum is the p-norm:
:
\| (x_1, \ldots, x_n) \|_p = \left( \sum_{i=1}^n |x_i|^p \right)^\frac{1}{p}
which converges to as .
An advantage of the p-norm is that it is a norm. As such it is scale invariant (homogeneous): , and it satisfies the triangle inequality.
= Smooth maximum unit =
The following binary operator is called the Smooth Maximum Unit (SMU):{{Cite arXiv|eprint = 2111.04682|last1 = Biswas|first1 = Koushik|last2 = Kumar|first2 = Sandeep|last3 = Banerjee|first3 = Shilpak|author4 = Ashish Kumar Pandey|title = SMU: Smooth activation function for deep networks using smoothing maximum technique|year = 2021| class=cs.LG }}
:
\begin{align}
\textstyle\max_\varepsilon(a, b)
&= \frac{a + b + |a - b|_\varepsilon}{2} \\
&= \frac{a + b + \sqrt{(a - b)^2 + \varepsilon}}{2}
\end{align}
where is a parameter. As , and thus .
See also
References
{{Reflist}}
Category:Mathematical notation
Category:Basic concepts in set theory
https://www.johndcook.com/soft_maximum.pdf
M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," in Proc. ESANN, Apr. 2014, pp. 271-276.
(https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-153.pdf)