random coordinate descent
Randomized (Block) Coordinate Descent Method is an optimization algorithm popularized by Nesterov (2010) and Richtárik and Takáč (2011). The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010).{{Citation
| last=Nesterov
| first=Yurii
| year=2010
| title=Efficiency of coordinate descent methods on huge-scale optimization problems
| doi=10.1137/100802001
| volume=22
| issue=2
| journal=SIAM Journal on Optimization
| pages=341–362
| citeseerx=10.1.1.332.3336
}} In Nesterov's analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor. Richtárik and Takáč (2011) provide iteration complexity bounds that do not require this assumption, meaning the method is applied directly to the objective function. Additionally, they generalize the framework to the problem of minimizing a composite function, specifically the sum of a smooth convex function and a (possibly nonsmooth) convex block-separable function.
where is decomposed into blocks of variables/coordinates: and are (simple) convex functions.
Example (block decomposition): If and , one may choose and .
Example (block-separable regularizers):
- , where and is the standard Euclidean norm.
Algorithm
Consider the optimization problem
:
where is a convex and smooth function.
Smoothness: By smoothness we mean the following: we assume the gradient of is coordinate-wise Lipschitz continuous with constants . That is, we assume that
:
for all and , where denotes the partial derivative with respect to variable .
Nesterov, and Richtarik and Takac showed that the following algorithm converges to the optimal point:
{{algorithm-begin|name=Random Coordinate Descent Method}}
Input: //starting point
Output:
set x := x_0
for k := 1, ... do
choose coordinate , uniformly at random
update
end for
{{algorithm-end}}
Convergence rate
Since the iterates of this algorithm are random vectors, a complexity result would give a bound on the number of iterations needed for the method to output an approximate solution with high probability. It was shown in {{Citation
| last=Richtárik
| first=Peter
| last2=Takáč
| first2=Martin
| year=2011
| title=Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
| journal=Mathematical Programming, Series A
| doi=10.1007/s10107-012-0614-z
| volume=144
| issue=1–2
| pages=1–38| arxiv=1107.2848
}} that if
,
where ,
is an optimal solution (),
is a confidence level and is target accuracy,
then .
Example on particular function
The following Figure shows how develops during iterations, in principle.
The problem is
:
1 & 0.5 \\ 0.5 & 1
\end{array}
\right)
x -\left(\begin{array}{cc}
1.5 & 1.5
\end{array}
\right)x,\quad x_0=\left(\begin{array}{ cc}
0 & 0
\end{array}
\right)^T
Extension to block coordinate setting
One can naturally extend this algorithm not only just to coordinates, but to blocks of coordinates. Assume that we have space . This space has 5 coordinate directions, concretely
e_2 = (0,1,0,0,0)^T,
e_3 = (0,0,1,0,0)^T,
e_4 = (0,0,0,1,0)^T,
e_5 = (0,0,0,0,1)^T
in which Random Coordinate Descent Method can move. However, one can group some coordinate directions into blocks and we can have instead of those 5 coordinate directions 3 block coordinate directions (see image).
See also
References
{{reflist}}