Regularized gradient descent

I have been attending a talk discussing inverse problems optimization using plug-and-play algorithms. During the talk, the speaker made a quick and clear path from regularized gradient descent to plug-and-play approaches.

Here it is. When you want to constrain your optimization problem to find solutions, you can classically add a penalty term, or so-called regularization term to the loss function.

For example, consider a regularized least square with a radial basis function representation of your input. In the expression below, we denote by $y$ the vector of all the outputs to be predicted, $w$ the amplitudes of the gaussians of the RBF. We do not specify it further but, to fix the ideas, $\Phi$ would be the function whose output components contain the evaluation of the individual RBF functions evaluated on the data points $x$.

\[min_w \|y - w^T \Phi(x)\|^2 + \lambda \|w\|_1\]

This is of the general form \(f(x) + g(x)\) where \(g\) contains our regularization term. Here, the regularization is the L1 norm which promotes sparsity. With classical gradient descent, you compute the gradient of this compound cost and iterate with whichever algorithm you like (stochastic gradient descent, adam, etc…).

A step further, you could iterate one step of minimizing the likelihood (minimizing the MSE can be seen as maximizing the likelihood of the data under the hypothesis of a normal distribution whose mean is given by our linear model) and one step of minimizing the constraint. Proximal gradient descent is a step further as it allows to consider \(g(x)\) that are not differentiable.

Proximal gradient descent iteratively updates the estimate of the solution by :

\[x_{k+1} = \text{prox}_\alpha(x_k - \alpha \nabla f(x_k))\]

with

\[\text{prox}_\alpha(z) = \text{argmin}_y (\frac{1}{2\alpha}\|y - z\|^2 + g(y))\]

The operator \(\text{prox}\) is called the proximity operator.

The proximal operator requires a minimization. In the plug-and-play (Venkatakrishnan et al., 2013) approach, you replace that minimization by a denoising step. This denoising step can be implemented with a denoising neural network.

References

  1. Plug-and-Play priors for model based reconstruction
    Singanallur V. Venkatakrishnan, Charles A. Bouman, and Brendt Wohlberg
    In 2013 IEEE Global Conference on Signal and Information Processing, 2013



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Atomic auto-encoders for learning sparse representations
  • Test
  • Changing surrounding delimiters in VIM
  • Python and Paraview for batch processing
  • One liners