lbfgsb.minimize_lbfgsb#
- lbfgsb.minimize_lbfgsb(*, x0: ndarray[tuple[Any, ...], dtype[float64]], fun: Callable[[ndarray[tuple[Any, ...], dtype[float64]]], float | int | floating | complex | complexfloating] | Callable[[ndarray[tuple[Any, ...], dtype[float64]]], Tuple[float | int | floating, ndarray[tuple[Any, ...], dtype[float64]]]], jac: Callable[[ndarray[tuple[Any, ...], dtype[float64]]], ndarray[tuple[Any, ...], dtype[float64]]] | bool | Literal['2-point', '3-point', 'cs'] | None, update_fun_def: Callable[[ndarray[tuple[Any, ...], dtype[float64]], float, float, ndarray[tuple[Any, ...], dtype[float64]], Deque[ndarray[tuple[Any, ...], dtype[float64]]], Deque[ndarray[tuple[Any, ...], dtype[float64]]]], Tuple[float, float, ndarray[tuple[Any, ...], dtype[float64]], Deque[ndarray[tuple[Any, ...], dtype[float64]]]]] | None = None, bounds: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, checkpoint: OptimizeResult | None = None, maxcor: int = 10, ftarget: float | Callable[[], float] | None = None, ftol: float = 1e-05, gtol: float | Callable[[], float] = 1e-05, maxiter: int = 50, eps: float = 1e-08, maxfun: int = 15000, callback: Callable[[ndarray[tuple[Any, ...], dtype[float64]], OptimizeResult], bool] | None = None, maxls: int = 20, finite_diff_rel_step: float | None = None, max_steplength: float = 100000000.0, ftol_linesearch: float = 0.001, gtol_linesearch: float = 0.9, xtol_linesearch: float = 0.1, eps_SY: float = 2.2e-16, iprint: int = -1, logger: Logger | None = None, is_check_factorizations: bool = False, is_use_numba_jit: bool = True) OptimizeResult[source]#
Minimize a scalar function of one or more variables using the L-BFGS-B algorithm.
This routine solves bound-constrained optimization problems using a pure Python implementation of the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints, commonly known as L-BFGS-B.
The problem solved is
\[\min_x f(x)\]subject to
\[l_i \leq x_i \leq u_i,\quad i = 1, \ldots, n.\]The algorithm stores only a limited number of correction pairs to approximate the inverse Hessian, making it suitable for large-scale problems.
- Parameters:
x0 (ndarray, shape (n,)) – Initial guess. The array contains the starting values of the
nindependent variables. If bounds are provided,x0is clipped to the feasible box before the first evaluation.fun (callable) –
Objective function to minimize.
If
jacisNone,False, a finite-difference scheme, or a callable,funmust return only the scalar objective value:fun(x) -> float
If
jacisTrue,funmust return both the objective value and the gradient:fun(x) -> tuple[float, ndarray]
In all cases,
xis a one-dimensional array with shape(n,). If additional arguments are required by the objective function, use a closure,functools.partial, or another wrapper.jac ({callable, bool, '2-point', '3-point', 'cs'}, optional) –
Method used to compute the gradient vector.
If
jacis a callable, it must return the gradient vector:jac(x) -> ndarray, shape (n,)
If
jacisTrue,funis assumed to return a pair(f, g), wherefis the scalar objective value andgis the gradient vector. This is compatible with the convention used byscipy.optimize.minimize().If
jacis one of'2-point','3-point', or'cs', the gradient is approximated numerically using the corresponding finite-difference scheme. These finite-difference schemes obey the specified bounds.If
jacisNoneorFalse, the gradient is approximated using forward finite differences with absolute step sizeeps.Default is
None.update_fun_def (callable, optional) –
Experimental callback used to update the objective-function definition during optimization and to modify the stored gradient sequence accordingly.
This is primarily intended for problems where the objective function changes during optimization, for example regularized problems where the regularization weight is adapted on the fly. Since the L-BFGS approximation depends on past gradient differences, the stored gradient sequence must be updated to remain consistent with the new objective definition.
The callable must have signature:
update_fun_def( x, f0, f0_old, grad, x_deque, grad_deque, ) -> tuple[float, float, ndarray, deque[ndarray]]
where
xis the current point,f0is the current objective value,f0_oldis the previous objective value,gradis the current gradient,x_dequeis the stored sequence of past iterates, andgrad_dequeis the stored sequence of past gradients.bounds (sequence or scipy.optimize.Bounds, optional) –
Bounds on the variables.
Bounds may be specified in one of two forms:
An instance of
scipy.optimize.Bounds.A sequence of
(min, max)pairs for each variable.
Use
Nonefor one side of a bound to indicate no bound in that direction.checkpoint (scipy.optimize.OptimizeResult, optional) –
Previous optimization result used to restart the solver.
When provided, the solver restarts from the checkpoint without discarding the stored L-BFGS correction pairs. The previous objective value, gradient, and inverse-Hessian approximation are reused. The objective-function definition must be unchanged between the original run and the restarted run.
This is useful when an optimization was stopped early, when stopping criteria need to be modified, or when parameters such as
maxcororftolneed to be changed between runs.Added in version 1.0.
maxcor (int, optional) – Maximum number of variable-metric corrections used to approximate the inverse Hessian. The L-BFGS-B algorithm does not store the full Hessian matrix; it stores at most
maxcorcorrection pairs. Default is10.ftarget (float or callable, optional) –
Target objective-function value.
The optimization stops when
\[f^{k+1} \leq f_{\mathrm{target}}.\]If
ftargetis callable, it is called once after the first objective and gradient evaluation. This makes it possible to define a target value based on the initial objective value, for example when scaling is applied.If
None, this stopping criterion is disabled. Default isNone.ftol (float, optional) –
Relative objective-function decrease tolerance.
The optimization stops when
\[\frac{f^k - f^{k+1}} {\max(|f^k|, |f^{k+1}|, 1)} \leq \mathrm{ftol}.\]In the original Fortran implementation, this corresponds to
factr * epsmch. Typical values are:5e-3for low accuracy,5e-8for moderate accuracy,5e-14for high accuracy.
If
ftolis0, the test stops only when the objective function remains unchanged after one iteration. Default is1e-5.gtol (float or callable, optional) –
Projected-gradient tolerance.
The optimization stops when
\[\max_i |\operatorname{proj} g_i| \leq \mathrm{gtol},\]where
proj gis the projected gradient. Ifgtolis callable, it is called once after the first objective and gradient evaluation. This allows the stopping tolerance to depend on the initial objective value. Default is1e-5.eps (float, optional) – Absolute step size used for forward finite-difference approximation of the gradient when
jacisNoneorFalse. Default is1e-8.maxfun (int, optional) –
Maximum number of objective-function evaluations.
The limit may be exceeded when gradients are estimated by numerical differentiation. Interruptions due to
maxfunare postponed until the end of the current minimization iteration. Default is15000.maxiter (int, optional) – Maximum number of iterations. Default is
50.callback (callable, optional) –
Callback called after each iteration.
The callable must have signature:
callback(xk, state) -> bool
where
xkis the current parameter vector andstateis anscipy.optimize.OptimizeResultcontaining the current solver state.If the callback returns
True, the optimization stops.maxls (int, optional) – Maximum number of line-search steps per iteration. Default is
20.finite_diff_rel_step (float, optional) –
Relative step size used for numerical gradient approximation when
jacis'2-point','3-point', or'cs'.The absolute step size is computed as
\[h = \mathrm{rel\_step} \operatorname{sign}(x) \max(1, |x|),\]and is adjusted if necessary to respect the bounds. For
jac='3-point', the sign ofhis ignored. IfNone, the step size is selected automatically. Default isNone.max_steplength (float, optional) – Maximum allowed step length during the line search. Default is
1e8.ftol_linesearch (float, optional) –
Tolerance for the sufficient-decrease condition in the line search.
This is the Wolfe condition parameter \(c_1\) in
\[f(x_k + \alpha_k p_k) \leq f(x_k) + c_1 \alpha_k p_k^\mathrm{T} \nabla f(x_k).\]Usually, \(c_1\) is small and satisfies \(0 < c_1 < 1\). In the original L-BFGS-B Fortran implementation, this parameter is hard-coded to
1e-3. Default is1e-3.gtol_linesearch (float, optional) –
Tolerance for the curvature condition in the line search.
This is the Wolfe condition parameter \(c_2\) in
\[\left| p_k^\mathrm{T} \nabla f(x_k + \alpha_k p_k) \right| \leq c_2 \left| p_k^\mathrm{T} \nabla f(x_k) \right|.\]The parameters should satisfy \(0 < c_1 < c_2 < 1\). Usually, \(c_2\) is much larger than \(c_1\). In the original L-BFGS-B Fortran implementation, this parameter is hard-coded to
0.9. Default is0.9.xtol_linesearch (float, optional) – Relative tolerance for acceptable steps in the line-search procedure. In the original L-BFGS-B Fortran implementation, this parameter is hard-coded to
0.1. Default is0.1.eps_SY (float, optional) – Numerical threshold used when updating the L-BFGS correction matrices. Default is
2.2e-16.iprint (int, optional) –
Verbosity level.
iprint < 0: no output.iprint == 0: print only the final summary.0 < iprint < 99: print objective value and projected-gradient norm
every
iprintiterations. -iprint >= 99: print detailed information at every iteration, except full vectors.Default is
-1.logger (logging.Logger, optional) – Logger used for textual output. If
None, no output is emitted regardless ofiprint. Default isNone.is_check_factorizations (bool, optional) – Whether to run additional consistency checks on matrix factorizations. Intended for development and debugging only. Default is
False.is_use_numba_jit (bool, optional) –
Whether to use
numbajust-in-time compilation for performance-critical parts of the algorithm.If
Truebutnumbais unavailable, a warning is raised and the solver falls back to the pure Python implementation. The expected speed-up for large problems is typically between two and five times. Default isTrue.Added in version 1.0.
- Returns:
Optimization result represented as an
scipy.optimize.OptimizeResult.Important fields include:
xndarrayFinal solution.
funfloatObjective value at the final solution.
jacndarrayGradient at the final solution.
nfevintNumber of objective-function evaluations.
njevintNumber of gradient evaluations.
nitintNumber of iterations.
statusintSolver status code.
messagestrTermination message.
successboolWhether the solver terminated successfully.
hess_invscipy.optimize.LbfgsInvHessProductLimited-memory inverse-Hessian approximation.
- Return type:
scipy.optimize.OptimizeResult
- Raises:
ValueError – Raised when the provided checkpoint is inconsistent with
x0or when restored correction vectors have incompatible dimensions.
Notes
This implementation is a Python reimplementation of the L-BFGS-B algorithm, rather than a wrapper around the original Fortran code. It exposes several parameters that are fixed in the historical implementation, including line-search Wolfe-condition parameters.
The convention
jac=Truefollowsscipy.optimize.minimize(): in that case,funmust return both the objective value and the gradient. For example:def fun(x): f = x[0] ** 2 + x[1] ** 2 g = np.array([2 * x[0], 2 * x[1]]) return f, g res = minimize_lbfgsb(x0=x0, fun=fun, jac=True)
If
jacisNoneorFalse, finite differences are used.Examples
Minimize a quadratic function with a separate gradient function:
import numpy as np from lbfgsb import minimize_lbfgsb def objective(x): return x[0] ** 2 + x[1] ** 2 def gradient(x): return np.array([2 * x[0], 2 * x[1]]) x0 = np.array([10.0, 10.0]) res = minimize_lbfgsb( x0=x0, fun=objective, jac=gradient, ) print(res.x) print(res.fun)
Use the SciPy-compatible
jac=Trueconvention:import numpy as np from lbfgsb import minimize_lbfgsb def objective_and_gradient(x): f = x[0] ** 2 + x[1] ** 2 g = np.array([2 * x[0], 2 * x[1]]) return f, g x0 = np.array([10.0, 10.0]) res = minimize_lbfgsb( x0=x0, fun=objective_and_gradient, jac=True, ) print(res.x) print(res.fun)
Use finite differences with bounds:
import numpy as np from lbfgsb import minimize_lbfgsb def objective(x): return x[0] ** 2 + x[1] ** 2 x0 = np.array([10.0, 10.0]) bounds = [(0.0, None), (0.0, None)] res = minimize_lbfgsb( x0=x0, fun=objective, jac="2-point", bounds=bounds, )
References