Numerical Optimization: Understanding L-BFGS

34 points by aria 11 years ago · 14 comments

Reader

This is a nice introduction, but it looks there is a problem with the formatting: some of the math is not rendered properly (in the source, some expressions are wrapped in dollar-signs rather than script tags).

ariaOP 11 years ago

What browser are you in? It just using MathJax and appears to work in WebKit and Firefox.
- jamessb 11 years ago
  
  Firefox 33.1.1 on Mac OS 10.10.1. Initially it would get stuck looking like: http://imgur.com/DhXAmI7
  After refreshing the page a few times, all the formulae rendered correctly. In Safari things rendered correctly immediately.

howeman 11 years ago

Gonum has an LBFGS implementation https://github.com/gonum/optimize/blob/master/lbfgs.go

howeman 11 years ago

See the "NextDirection" method for the update.

ariaOP 11 years ago

Author here, happy to answer any questions.

stephencanon 11 years ago

I hate to be that guy, but "Raphson", not "Rhapson" (http://en.wikipedia.org/wiki/Joseph_Raphson). It's also worth noting that no one ever actually forms the inverse of H, even if H is dense. At worst you would compute some factorization of H and use that to solve for the update d.
- ariaOP 11 years ago
  
  Typo is fixed, thanks!
  I think that's explicitly mentioned in the Quasi-Newton section that you only need to implicitly multiply and not form the matrix.
  - tjl 11 years ago
    
    I still see it misspelt throughout.
    
    ariaOP 11 years ago
    
    Want to try a refresh, I think it ought to be fixed now. Thanks
lliiffee 11 years ago

Is it typical to interface to Newton's method via a function that computes the inverse Hessian? I've never seen that. Typically, people claim it is numerically unstable to explicitly invert the Hessian, and that it would be better for the interface to take the Hessian itself, and then call a subroutine to do a linear solve.
- ariaOP 11 years ago
  
  In practice you only need to do $H^{-1} g$; L-BFGS stores the {s_k} and {y_k} vectors which allow you to do the $H^{-1} g$ directly rather than needing to ever form the hessian or its inverse. There are techniques that aren't BFGS-based which approximate the hessian rather than the inverse and in that case you'd be better off solving.
  - lliiffee 11 years ago
    
    What I mean is that, if just doing regular Newton, I'm not sure I've ever seen an example using the interface you show. I think that the vast majority of Newton's method implementations pass H rather than H^-1, and get the search direction via "solve_H_g(H,g)" rather than "solve_iH_g(iH,g)". It takes cubic time to compute H^-1 from H (after which you can compute (H^-1)g) and it takes cubic time to solve for x such that H*x=g, but the rather is said to be more stable, and so preferred.
    I know this is irrelevant to the main part of the post, which is to explain LBFGS, I'm just genuinely interested if there are applications where its better to pass the inverse of H rather than H itself for some reason.
    
    ariaOP 11 years ago
    
    The interesting part of BFGS is that it directly approximates H^{-1} rather than H.

Settings

Numerical Optimization: Understanding L-BFGS

Keyboard Shortcuts