# Diff of /branches/doubleplusgood/doc/inversion/Minimization.tex

revision 4344 by jfenwick, Wed Feb 27 03:42:40 2013 UTC revision 4345 by jfenwick, Fri Mar 29 07:09:41 2013 UTC
# Line 14  Line 14
14
15  \chapter{Minimization Algorithms}\label{chapter:ref:Minimization}  \chapter{Minimization Algorithms}\label{chapter:ref:Minimization}
16  We need to find the level set function $m$ minimizing the cost function $J$ as  We need to find the level set function $m$ minimizing the cost function $J$ as
17  defined in equation~\ref{REF:EQU:INTRO 1}. The physical parameters $p^f$ and the  defined in Equation~\ref{REF:EQU:INTRO 1}. The physical parameters $p^f$ and
18  data defects are linked through state variables $u^f$ which is a  the data defects are linked through state variables $u^f$ which is given as a
19  given as a solution of a partial differential equation (PDE) with  solution of a partial differential equation (PDE) with coefficients depending
20  coefficients depending on $p_f$. This PDE  (or - in case of several forward models - this set of PDEs)  on $p_f$. This PDE (or -- in case of several forward models -- this set of PDEs)
21  defines a constraint in the minimization problem. In the context of our applications  defines a constraint in the minimization problem.
22  it can be assumed the the PDE is of 'maximum rank', ie. for a given value of the level set function $m$  In the context of our applications it can be assumed that the PDE is of
23  there is a unique value for the state variables $u^f$ as the solution of the forward model. So from a mathematical point of view  'maximum rank', i.e. for a given value of the level set function $m$ there is a
24  the state variable $u^f$ can be eliminated from the problem and the minimization problem becomes in fact  unique value for the state variables $u^f$ as the solution of the forward model.
25  a minimization problem for the level set function $m$ alone where the physical parameters which are of most interest  So from a mathematical point of view the state variable $u^f$ can be eliminated
26  in applications can easily be derived from the solution. However one need to keep in mind that  from the problem and the minimization problem becomes in fact a minimization
27  each evaluation of the cost function requires the solution of a PDE (an additional PDE solution is required for the  problem for the level set function $m$ alone where the physical parameters
28  gradient).  which are of most interest in applications can easily be derived from the
29    solution. However one needs to keep in mind that each evaluation of the cost
30  In some application cases the optimization problem to be solved defines a quadratic programming problem which would allow  function requires the solution of a PDE (an additional PDE solution is required
31  to use a special case of solvers. However, in the general scenarios we are interested here we cannot assume this  for the gradient).
32  simplification and need to be able to solve for a general cost function.
33  The method used here is the limited-memory Broyden-Fletcher-Goldfarb-Shanno (\emph{L-BFGS}) method, see~\cite{Nocedal1980}.  In some application cases the optimization problem to be solved defines a
34  The method is a quasi-Newton method.  quadratic programming problem which would allow to use a special case of
35  To implement the method one needs to provides the evaluation of the cost function $J$ and its gradient $\nabla J$  solvers.
36  and dual product $<.,.>$\footnote{A dual product $<.,.>$ defines function which assigns  However, for the general scenarios we are interested in here we cannot assume
37  a pair $(p,g)$ of a level set increment $p$ and a gradient $g$ a real number $<p,g>$. It is linear  this simplification and need to be able to solve for a general cost function.
38    The method used here is the limited-memory Broyden-Fletcher-Goldfarb-Shanno
39    (\emph{L-BFGS}) method, see~\cite{Nocedal1980}.
40    This method is a quasi-Newton method.
41    To implement the method one needs to provide the evaluation of the cost
42    function $J$ and its gradient $\nabla J$ and dual product $<.,.>$\footnote{A
43    dual product $<.,.>$ defines a function which assigns a pair $(p,g)$ of a level
44    set increment $p$ and a gradient $g$ a real number $<p,g>$. It is linear
45  in the first and linear in the second argument.}  in the first and linear in the second argument.}
46  such that for $\alpha \to$ 0  such that for $\alpha \to$ 0
47  \label{EQU:MIN:1}  \label{EQU:MIN:1}
48  J(m+\alpha p) = J(m) + \alpha \cdot < p , \nabla J(m)> + o(\alpha)  J(m+\alpha p) = J(m) + \alpha \cdot < p , \nabla J(m)> + o(\alpha)
49
50  where $p$ is an increment to the level set  where $p$ is an increment to the level set function\footnote{$o$ denotes the
51  function\footnote{ $o$ denotes the little o notation see \url{http://en.wikipedia.org/wiki/Big_O_notation}}.  little o notation see \url{http://en.wikipedia.org/wiki/Big_O_notation}}.
52  Notice that if $m$ is the unknown solution one has $\nabla J(m)=0$.  Notice that if $m$ is the unknown solution one has $\nabla J(m)=0$.
53  Moreover, an approximation of the inverse of the Hessian operator $\nabla \nabla J(m)$ needs to be provided  Moreover, an approximation of the inverse of the Hessian operator
54  for a given value of  $m$. In the implementation we don't need to provide the entire operator but  $\nabla \nabla J(m)$ needs to be provided for a given value of $m$.
55  for a given gradient difference $g$ an approximation $h=Hg$ of $\nabla \nabla J(m))^{-1}g$ needs to be  In the implementation we don't need to provide the entire operator but for a
56  calculated. This is an approximative solution of the equation $\nabla \nabla J(m)) h =g$ which in variational  given gradient difference $g$ an approximation $h=Hg$ of
57  from is given as  $\nabla \nabla J(m))^{-1}g$ needs to be calculated.
58    This is an approximative solution of the equation $\nabla \nabla J(m)) h=g$
59    which in variational form is given as
60  \label{EQU:MIN:2}  \label{EQU:MIN:2}
61  <p, \nabla \nabla J(m)\; h > = <p, g>  <p, \nabla \nabla J(m)\; h > = <p, g>
62
63  for all level set increments $p$. In the cases relevant here, it is a possible approach to  for all level set increments $p$.
64  make the approximation $\nabla \nabla J(m) \approx \nabla \nabla J^{reg}(m)$ which can easily be construction  In the cases relevant here, it is a possible approach to make the approximation
65  and the resulting variational equation~\ref{EQU:MIN:2} can easily be solved, see Chapter~\ref{Chp:ref:regularization}.  $\nabla \nabla J(m) \approx \nabla \nabla J^{reg}(m)$ which can easily be
66    constructed and the resulting variational Equation~\ref{EQU:MIN:2} can easily
67  The L-BFGS is an iterative method which calculates a sequence of level set approximations $m_{k}$ which converges  be solved, see Chapter~\ref{Chp:ref:regularization}.
68  towards the unknown level set function minimizing the cost function.
69  This iterative process is terminated if certain stopping criteria are fulfilled. A criteria  L-BFGS is an iterative method which calculates a sequence of level set
70  commonly used is  approximations $m_{k}$ which converges towards the unknown level set function
71    minimizing the cost function.
72    This iterative process is terminated if certain stopping criteria are fulfilled.
73    A criterion commonly used is
74  \label{EQU:MIN:3a}  \label{EQU:MIN:3a}
75  \| m_k - m_{k-1} \| \le \|m_k\| \cdot {m\_tol}  \| m_k - m_{k-1} \| \le \|m_k\| \cdot {m\_tol}
76
77  in order to terminate the iteration after $m_k$ has been calculated. In this condition $\|.\|$ denotes a  in order to terminate the iteration after $m_k$ has been calculated.
78  norm and $x\_tol$ defines a relative tolerance. Typical value for  $x\_tol$ is $10^{-4}$. Alternatively  In this condition $\|.\|$ denotes a norm and $m\_tol$ defines a relative
79  one can check for changes in the cost function:  tolerance. A typical value for $m\_tol$ is $10^{-4}$. Alternatively one can
80    check for changes in the cost function:
81  \label{EQU:MIN:3b}  \label{EQU:MIN:3b}
82  \mid J(m_k) - J(m_k) \mid \le \mid J(m_k) - J(m_0) \mid  \cdot {J\_tol}    \mid J(m_k) - J(m_k) \mid \le \mid J(m_k) - J(m_0) \mid \cdot {J\_tol}
83
84  where ${J\_tol}$ is a given tolerance and $m_0$ is the initial guess of the solution.  where ${J\_tol}$ is a given tolerance and $m_0$ is the initial guess of the solution.
85  As this criterion is depending ion the initial guess it is not recommended.  As this criterion depends on the initial guess it is not recommended.
86
87  \section{Solver Classes}  \section{Solver Classes}
88
89  \begin{classdesc}{MinimizerLBFGS}{\optional{J=\None}}  \begin{classdesc}{MinimizerLBFGS}{%
90  Constructs an instance of the limited-memory Broyden-Fletcher-Goldfarb-Shanno \emph{L-BFGS} solver.  \optional{J=\None}%
91  \member{J} set the cost function to be minimized, see Section~\ref{chapter:ref:Minimization: costfunction class}.  \optional{, m_tol=1e-4}%
92    \optional{, J_tol=\None}%
93    \optional{, imax=300}}
94    constructs an instance of the limited-memory Broyden-Fletcher-Goldfarb-Shanno \emph{L-BFGS} solver.
95    \member{J} sets the cost function to be minimized, see Section~\ref{chapter:ref:Minimization: costfunction class}.
96  If not present the cost function needs to be set using the \member{setCostFunction} method.  If not present the cost function needs to be set using the \member{setCostFunction} method.
97  \member{imax} sets the maximum number of iteration steps.  \member{m_tol} and \member{J_tol} specify the tolerances for the stopping
98    criteria (see description of \member{setTolerance} below).
99    \member{imax} sets the maximum number of iteration steps (see
100    \member{setMaxIterations} below).
101  \end{classdesc}  \end{classdesc}
102
103  \begin{methoddesc}[MinimizerLBFGS]{setTolerance}{\optional{x_tol=1e-5} \optional{, J_tol=\None}}  \begin{methoddesc}[MinimizerLBFGS]{setTolerance}{\optional{m_tol=1e-4}\optional{, J_tol=\None}}
104  sets the tolerances for the stopping criterions.  sets the tolerances for the stopping criteria.
105  \member{x_tol} sets the tolerance for the unknown level set function~\ref{EQU:MIN:3a}. If \member{x_tol}=\None  \member{m_tol} sets the tolerance for the unknown level set Function~\ref{EQU:MIN:3a}.
106  tolerance on  level set function is not checked.  If \member{m_tol}=\None the tolerance on level set function is not checked.
107  \member{J_tol} sets the tolerance for the cost function, see equation~\ref{EQU:MIN:3b}.  \member{J_tol} sets the tolerance for the cost function, see Equation~\ref{EQU:MIN:3b}.
108  If \member{J_tol} = \None  If \member{J_tol}=\None tolerance on the cost function is not checked (recommended).
109  tolerance on cost function is not checked (recommended). If \member{x_tol} and \member{J_tol} are both set  If \member{m_tol} and \member{J_tol} are both set criterion~\ref{EQU:MIN:3a}
110  criterion~\ref{EQU:MIN:3a} and criterion~\ref{EQU:MIN:3b} to terminate the iteration.  and criterion~\ref{EQU:MIN:3b} are both applied to terminate the iteration.
111  \end{methoddesc}  \end{methoddesc}
112

113  \begin{methoddesc}[MinimizerLBFGS]{setMaxIterations}{imax}  \begin{methoddesc}[MinimizerLBFGS]{setMaxIterations}{imax}
114  sets the maximum number of iterations before the minimizer terminates. If  sets the maximum number of iterations before the minimizer terminates. If
115  the maximum number of iteration is reached \exception{MinimizerMaxIterReached} exception is thrown.  the maximum number of iteration is reached a \exception{MinimizerMaxIterReached}
116    exception is thrown.
117  \end{methoddesc}  \end{methoddesc}
118
119  \begin{methoddesc}[MinimizerLBFGS]{getResult}{}  \begin{methoddesc}[MinimizerLBFGS]{getResult}{}
120  returns the solution of the optimization problem.  returns the solution of the optimization problem.
121  This method can only be called if the optimization process as been completed.  This method only returns something useful if the optimization process has
122  If the iteration failed the last available approximation of the solution is returned.  completed.
123  If an exception is thrown, the last available solution approximation is returned.  If the iteration failed the last available approximation of the solution is
124    returned.
125  \end{methoddesc}  \end{methoddesc}
126
127  \begin{methoddesc}[MinimizerLBFGS]{getOptions}{}  \begin{methoddesc}[MinimizerLBFGS]{getOptions}{}
128  returns the solver options as a dictionary.  returns the solver options as a dictionary which contains all available option
129    names and their current value.
130  \end{methoddesc}  \end{methoddesc}
131
132  \begin{methoddesc}[MinimizerLBFGS]{setOptions}{\optional{truncation=30} \optional{, restart=60}}  \begin{methoddesc}[MinimizerLBFGS]{setOptions}{%
133  set solver options. \member{truncation} defines number of gradients to keep in memory.  \optional{truncation=30}%
134    \optional{, restart=60}%
135    \optional{, initialHessian=1}}
136    sets solver options. \member{truncation} defines number of gradients to keep in memory.
137  \member{restart} defines the number of iteration steps after which the iteration is restarted.  \member{restart} defines the number of iteration steps after which the iteration is restarted.
138    \member{initialHessian} can be used to provide an estimate for the scale of
139    the inverse Hessian. If the cost function provides such an estimate this
140    option is ignored.
141  \end{methoddesc}  \end{methoddesc}
142
143  \begin{methoddesc}[MinimizerLBFGS]{run}{m0}  \begin{methoddesc}[MinimizerLBFGS]{run}{m0}
144  runs the solution algorithm and returns an approximation of the solution.  runs the solution algorithm and returns an approximation of the solution.
145  \member{m0} set the initial guess.  \member{m0} is the initial guess to use.
146    This method may throw an exception if the maximum number of iterations is
147    reached or a solver breakdown occurs.
148  \end{methoddesc}  \end{methoddesc}
149
150  \begin{methoddesc}[MinimizerLBFGS]{logSummary}{}  \begin{methoddesc}[MinimizerLBFGS]{logSummary}{}
152  \end{methoddesc}  \end{methoddesc}
153
154  \section{\member{CostFunction} Class Template}\label{chapter:ref:Minimization: costfunction class}  \section{\member{CostFunction} Class Template}\label{chapter:ref:Minimization: costfunction class}
157  template of cost function $J$  template of cost function $J$
158  \end{classdesc}  \end{classdesc}
159
\begin{methoddesc}[CostFunction]{resetCounters}{}
resets all statistical counters.
\end{methoddesc}
%
160  \begin{methoddesc}[CostFunction]{getDualProduct}{m, g}  \begin{methoddesc}[CostFunction]{getDualProduct}{m, g}
161      returns the dual product $<m,g>$ of \member{m} and \member{g}.  returns the dual product $<m,g>$ of \member{m} and \member{g}.
162  \end{methoddesc}  \end{methoddesc}
163  %  %
164  \begin{methoddesc}[CostFunction]{getValue}{m, *args}  \begin{methoddesc}[CostFunction]{getValue}{m, *args}
165      returns the value $J(m)$ using pre--calculated values \member{args} for $m$.  returns the value $J(m)$ using pre-calculated values \member{args} for $m$.
166  \end{methoddesc}  \end{methoddesc}
167  %  %
169      returns the gradient $\nabla J$ of $J$ at $m$ using precalculated values \member{args}  for $m$.  returns the gradient $\nabla J$ of $J$ at $m$ using pre-calculated values \member{args} for $m$.
170  \end{methoddesc}  \end{methoddesc}
171  %  %
172  \begin{methoddesc}[CostFunction]{getArguments}{m}  \begin{methoddesc}[CostFunction]{getArguments}{m}
173      returns pre--calculated values that are shared in the calculation of $J(m)$  returns pre-calculated values that are shared in the calculation of $J(m)$ and
174      and $\nabla J(m)$.  $\nabla J(m)$.
175  \end{methoddesc}  \end{methoddesc}
176    %
177  \begin{methoddesc}[CostFunction]{getNorm}{m}  \begin{methoddesc}[CostFunction]{getNorm}{m}
178  returns the norm $\|m\|$ of \member{m}.  returns the norm $\|m\|$ of \member{m}.
179  \end{methoddesc}  \end{methoddesc}
180    %
181    \begin{methoddesc}[CostFunction]{updateHessian}{}
182    notifies the class that the Hessian operator needs to be updated in case the
183    class provides an approximation for the inverse of the Hessian.
184    \end{methoddesc}
185    %
186    \begin{methoddesc}[CostFunction]{getInverseHessianApproximation}{m, g, *args}
187    returns an approximative evaluation $p$ of the inverse of the Hessian operator
188    of the cost function for a given gradient $g$ at location $m$.
189    \end{methoddesc}
190
191
192  %  %
# Line 171  returns the norm $\|m\|$ of \member{m}. Line 208  returns the norm $\|m\|$ of \member{m}.
208  %    \beta_{i+1}=\text{max}\left\{\frac{\nabla f^T(\mat{x}_{i+1})(\nabla f(\mat{x}_{i+1})-\nabla f(\mat{x}_i))}{\| \nabla f(\mat{x}_i)\|}, 0\right\}  %    \beta_{i+1}=\text{max}\left\{\frac{\nabla f^T(\mat{x}_{i+1})(\nabla f(\mat{x}_{i+1})-\nabla f(\mat{x}_i))}{\| \nabla f(\mat{x}_i)\|}, 0\right\}
209  %\end{equation*}  %\end{equation*}
210
211  \section{The L-BFGS Algortihm}  \section{The L-BFGS Algorithm}

212
213  %\section{Limited-Memory BFGS Method}\label{sec:LBFGS}  %\section{Limited-Memory BFGS Method}\label{sec:LBFGS}
214  The L-BFGS algorithm, see~\cite{Nocedal1980}, is as follows:  The L-BFGS algorithm looks as follows (see also~\cite{Nocedal1980})\index{L-BFGS}:
215  \begin{algorithm}  \begin{algorithm}
216      \mbox{L-BFGS}      \mbox{L-BFGS}
217      \begin{program}      \begin{program}
# Line 188  The L-BFGS algorithm, see~\cite{Nocedal1 Line 224  The L-BFGS algorithm, see~\cite{Nocedal1
224          p_k \leftarrow |twoLoop|(\{s, g\});          p_k \leftarrow |twoLoop|(\{s, g\});
225          \alpha_k \leftarrow |lineSearch|(m_k, p_k); \rcomment{See Section~\ref{sec:LineSearch}}          \alpha_k \leftarrow |lineSearch|(m_k, p_k); \rcomment{See Section~\ref{sec:LineSearch}}
226          m_{k+1} \leftarrow m_k+\alpha_k p_k;          m_{k+1} \leftarrow m_k+\alpha_k p_k;
227          \IF k> trunction          \IF k> truncation
228          \THEN          \THEN
229              |Discard the vector pair | \{s_{k-trunction}, g_{k-trunction}\} | from storage|;              |Discard the vector pair | \{s_{k-truncation}, g_{k-truncation}\} | from storage|;
230          \FI          \FI
231          s_k \leftarrow m_{k+1}-m_k;          s_k \leftarrow m_{k+1}-m_k;
232          g_k \leftarrow \nabla J_{k+1}-\nabla J_k;          g_k \leftarrow \nabla J_{k+1}-\nabla J_k;
# Line 199  The L-BFGS algorithm, see~\cite{Nocedal1 Line 235  The L-BFGS algorithm, see~\cite{Nocedal1
235          \WHERE          \WHERE
236          \FUNCT |twoLoop|(\{s, g\}) \BODY          \FUNCT |twoLoop|(\{s, g\}) \BODY
237              \EXP q \leftarrow \nabla J_k;              \EXP q \leftarrow \nabla J_k;
238              \FOR i=k-1, k-2, \ldots, k-trunction \DO              \FOR i=k-1, k-2, \ldots, k-truncation \DO
239              \rho_i \leftarrow \frac{1}{<s_i, g_i>};              \rho_i \leftarrow \frac{1}{<s_i, g_i>};
240              \alpha_i \leftarrow \rho_i <s_i, q>;              \alpha_i \leftarrow \rho_i <s_i, q>;
241              q \leftarrow q-\alpha_i\cdot  g_i;              q \leftarrow q-\alpha_i\cdot  g_i;
242              \OD              \OD
243              r \leftarrow H q;              r \leftarrow H q;
244              \FOR i = k-trunction, k-trunction+1, \ldots, k-1 \DO              \FOR i = k-truncation, k-truncation+1, \ldots, k-1 \DO
245              \beta \leftarrow \rho_i \cdot <r,g_i>;              \beta \leftarrow \rho_i \cdot <r,g_i>;
246              r \leftarrow r + s_i \cdot (\alpha_i - \beta)              r \leftarrow r + s_i \cdot (\alpha_i - \beta)
247              \OD              \OD
# Line 215  The L-BFGS algorithm, see~\cite{Nocedal1 Line 251  The L-BFGS algorithm, see~\cite{Nocedal1
251  \end{algorithm}  \end{algorithm}
252
253  \subsection{Line Search}\label{sec:LineSearch}  \subsection{Line Search}\label{sec:LineSearch}
254  Line search minimizes the function $\phi(\alpha)$ for a given level set function $m$ and  The line search procedure minimizes the function $\phi(\alpha)$ for a given
255  search direction $p$,  see \cite{Nocedal2006}, \cite{MoreThuente1992} with  level set function $m$ and search direction $p$, see \cite{Nocedal2006},
256    \cite{MoreThuente1992} with
257  \label{EQU:MIN:22}  \label{EQU:MIN:22}
258  \phi(\alpha) = J(m+ \alpha \cdot  p)  \phi(\alpha) = J(m+ \alpha \cdot  p)
259
260  Notice that  Notice that
261  \label{EQU:MIN:23}  \label{EQU:MIN:23}
262  \phi'(\alpha) = < p , \nabla J(m+\alpha \cdot p)>  \phi'(\alpha) = < p , \nabla J(m+\alpha \cdot p)> \; .
263
264

265  \begin{algorithm}  \begin{algorithm}
266      \mbox{Line Search that satisfies strong Wolfe conditions}      \mbox{Line Search that satisfies strong Wolfe conditions}
267      \begin{program}      \begin{program}
# Line 278  Notice that Line 314  Notice that
314      \end{program}      \end{program}
315  \end{algorithm}  \end{algorithm}
316

Legend:
 Removed from v.4344 changed lines Added in v.4345