Regularized gradient descent

I have been attending a talk discussing inverse problems optimization using plug-and-play algorithms. During the talk, the speaker made a quick and clear path from regularized gradient descent to plug-and-play approaches.

Here it is. When you want to constrain your optimization problem to find solutions, you can classically add a penalty term, or so-called regularization term to the loss function.

For example, consider a regularized least square with a radial basis function representation of your input. In the expression below, we denote by $y$ the vector of all the outputs to be predicted, $w$ the amplitudes of the gaussians of the RBF. We do not specify it further but, to fix the ideas, $\Phi$ would be the function whose output components contain the evaluation of the individual RBF functions evaluated on the data points $x$.

\[min_w \|y - w^T \Phi(x)\|^2 + \lambda \|w\|_1\]

This is of the general form \(f(x) + g(x)\) where \(g\) contains our regularization term. Here, the regularization is the L1 norm which promotes sparsity. With classical gradient descent, you compute the gradient of this compound cost and iterate with whichever algorithm you like (stochastic gradient descent, adam, etc…).

A step further, you could iterate one step of minimizing the likelihood (minimizing the MSE can be seen as maximizing the likelihood of the data under the hypothesis of a normal distribution whose mean is given by our linear model) and one step of minimizing the constraint. Proximal gradient descent is a step further as it allows to consider \(g(x)\) that are not differentiable.

Proximal gradient descent (Combettes & Pesquet, 2010; Parikh & Boyd, 2014) iteratively updates the estimate of the solution by :

\[x_{k+1} = \text{prox}_\alpha(x_k - \alpha \nabla f(x_k))\]

with

\[\text{prox}_\alpha(z) = \text{argmin}_y (\frac{1}{2\alpha}\|y - z\|^2 + g(y))\]

The operator \(\text{prox}\) is called the proximity operator.

The proximal operator requires a minimization. In the plug-and-play (Venkatakrishnan et al., 2013) approach, you replace that minimization by a denoising step. This denoising step can be implemented with a denoising neural network.

In his works, Samuel Hurault (who is recipients of a best PhD award), was interested (among other things I believe), to recover convergence guarantees on the the plug-and-play algorithms that are lost if you blindly implement the regularization with a denoising network.

References

  1. Proximal Splitting Methods in Signal Processing
    Patrick L. Combettes, and Jean-Christophe Pesquet
    2010
  2. Proximal Algorithms
    Neal Parikh, and Stephen Boyd
    Found. Trends Optim., Jan 2014
  3. Plug-and-Play priors for model based reconstruction
    Singanallur V. Venkatakrishnan, Charles A. Bouman, and Brendt Wohlberg
    In 2013 IEEE Global Conference on Signal and Information Processing, Jan 2013



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Customizing neovim
  • Adb commands over wifi
  • Rethinking deep vision interpretability
  • Atomic auto-encoders for learning sparse representations
  • Job token permissions to access registries