lbfgsb.minimize_lbfgsb#

lbfgsb.minimize_lbfgsb(*, x0: ndarray[tuple[Any, ...], dtype[float64]], fun: Callable[[ndarray[tuple[Any, ...], dtype[float64]]], float | int | floating | complex | complexfloating] | Callable[[ndarray[tuple[Any, ...], dtype[float64]]], Tuple[float | int | floating, ndarray[tuple[Any, ...], dtype[float64]]]], jac: Callable[[ndarray[tuple[Any, ...], dtype[float64]]], ndarray[tuple[Any, ...], dtype[float64]]] | bool | Literal['2-point', '3-point', 'cs'] | None, update_fun_def: Callable[[ndarray[tuple[Any, ...], dtype[float64]], float, float, ndarray[tuple[Any, ...], dtype[float64]], Deque[ndarray[tuple[Any, ...], dtype[float64]]], Deque[ndarray[tuple[Any, ...], dtype[float64]]]], Tuple[float, float, ndarray[tuple[Any, ...], dtype[float64]], Deque[ndarray[tuple[Any, ...], dtype[float64]]]]] | None = None, bounds: _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None = None, checkpoint: OptimizeResult | None = None, maxcor: int = 10, ftarget: float | Callable[[], float] | None = None, ftol: float = 1e-05, gtol: float | Callable[[], float] = 1e-05, maxiter: int = 50, eps: float = 1e-08, maxfun: int = 15000, callback: Callable[[ndarray[tuple[Any, ...], dtype[float64]], OptimizeResult], bool] | None = None, maxls: int = 20, finite_diff_rel_step: float | None = None, max_steplength: float = 100000000.0, ftol_linesearch: float = 0.001, gtol_linesearch: float = 0.9, xtol_linesearch: float = 0.1, eps_SY: float = 2.2e-16, iprint: int = -1, logger: Logger | None = None, is_check_factorizations: bool = False, is_use_numba_jit: bool = True) OptimizeResult[source]#

Minimize a scalar function of one or more variables using the L-BFGS-B algorithm.

This routine solves bound-constrained optimization problems using a pure Python implementation of the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints, commonly known as L-BFGS-B.

The problem solved is

\[\min_x f(x)\]

subject to

\[l_i \leq x_i \leq u_i,\quad i = 1, \ldots, n.\]

The algorithm stores only a limited number of correction pairs to approximate the inverse Hessian, making it suitable for large-scale problems.

Parameters:
  • x0 (ndarray, shape (n,)) – Initial guess. The array contains the starting values of the n independent variables. If bounds are provided, x0 is clipped to the feasible box before the first evaluation.

  • fun (callable) –

    Objective function to minimize.

    If jac is None, False, a finite-difference scheme, or a callable, fun must return only the scalar objective value:

    fun(x) -> float
    

    If jac is True, fun must return both the objective value and the gradient:

    fun(x) -> tuple[float, ndarray]
    

    In all cases, x is a one-dimensional array with shape (n,). If additional arguments are required by the objective function, use a closure, functools.partial, or another wrapper.

  • jac ({callable, bool, '2-point', '3-point', 'cs'}, optional) –

    Method used to compute the gradient vector.

    If jac is a callable, it must return the gradient vector:

    jac(x) -> ndarray, shape (n,)
    

    If jac is True, fun is assumed to return a pair (f, g), where f is the scalar objective value and g is the gradient vector. This is compatible with the convention used by scipy.optimize.minimize().

    If jac is one of '2-point', '3-point', or 'cs', the gradient is approximated numerically using the corresponding finite-difference scheme. These finite-difference schemes obey the specified bounds.

    If jac is None or False, the gradient is approximated using forward finite differences with absolute step size eps.

    Default is None.

  • update_fun_def (callable, optional) –

    Experimental callback used to update the objective-function definition during optimization and to modify the stored gradient sequence accordingly.

    This is primarily intended for problems where the objective function changes during optimization, for example regularized problems where the regularization weight is adapted on the fly. Since the L-BFGS approximation depends on past gradient differences, the stored gradient sequence must be updated to remain consistent with the new objective definition.

    The callable must have signature:

    update_fun_def(
        x,
        f0,
        f0_old,
        grad,
        x_deque,
        grad_deque,
    ) -> tuple[float, float, ndarray, deque[ndarray]]
    

    where x is the current point, f0 is the current objective value, f0_old is the previous objective value, grad is the current gradient, x_deque is the stored sequence of past iterates, and grad_deque is the stored sequence of past gradients.

  • bounds (sequence or scipy.optimize.Bounds, optional) –

    Bounds on the variables.

    Bounds may be specified in one of two forms:

    1. An instance of scipy.optimize.Bounds.

    2. A sequence of (min, max) pairs for each variable.

    Use None for one side of a bound to indicate no bound in that direction.

  • checkpoint (scipy.optimize.OptimizeResult, optional) –

    Previous optimization result used to restart the solver.

    When provided, the solver restarts from the checkpoint without discarding the stored L-BFGS correction pairs. The previous objective value, gradient, and inverse-Hessian approximation are reused. The objective-function definition must be unchanged between the original run and the restarted run.

    This is useful when an optimization was stopped early, when stopping criteria need to be modified, or when parameters such as maxcor or ftol need to be changed between runs.

    Added in version 1.0.

  • maxcor (int, optional) – Maximum number of variable-metric corrections used to approximate the inverse Hessian. The L-BFGS-B algorithm does not store the full Hessian matrix; it stores at most maxcor correction pairs. Default is 10.

  • ftarget (float or callable, optional) –

    Target objective-function value.

    The optimization stops when

    \[f^{k+1} \leq f_{\mathrm{target}}.\]

    If ftarget is callable, it is called once after the first objective and gradient evaluation. This makes it possible to define a target value based on the initial objective value, for example when scaling is applied.

    If None, this stopping criterion is disabled. Default is None.

  • ftol (float, optional) –

    Relative objective-function decrease tolerance.

    The optimization stops when

    \[\frac{f^k - f^{k+1}} {\max(|f^k|, |f^{k+1}|, 1)} \leq \mathrm{ftol}.\]

    In the original Fortran implementation, this corresponds to factr * epsmch. Typical values are:

    • 5e-3 for low accuracy,

    • 5e-8 for moderate accuracy,

    • 5e-14 for high accuracy.

    If ftol is 0, the test stops only when the objective function remains unchanged after one iteration. Default is 1e-5.

  • gtol (float or callable, optional) –

    Projected-gradient tolerance.

    The optimization stops when

    \[\max_i |\operatorname{proj} g_i| \leq \mathrm{gtol},\]

    where proj g is the projected gradient. If gtol is callable, it is called once after the first objective and gradient evaluation. This allows the stopping tolerance to depend on the initial objective value. Default is 1e-5.

  • eps (float, optional) – Absolute step size used for forward finite-difference approximation of the gradient when jac is None or False. Default is 1e-8.

  • maxfun (int, optional) –

    Maximum number of objective-function evaluations.

    The limit may be exceeded when gradients are estimated by numerical differentiation. Interruptions due to maxfun are postponed until the end of the current minimization iteration. Default is 15000.

  • maxiter (int, optional) – Maximum number of iterations. Default is 50.

  • callback (callable, optional) –

    Callback called after each iteration.

    The callable must have signature:

    callback(xk, state) -> bool
    

    where xk is the current parameter vector and state is an scipy.optimize.OptimizeResult containing the current solver state.

    If the callback returns True, the optimization stops.

  • maxls (int, optional) – Maximum number of line-search steps per iteration. Default is 20.

  • finite_diff_rel_step (float, optional) –

    Relative step size used for numerical gradient approximation when jac is '2-point', '3-point', or 'cs'.

    The absolute step size is computed as

    \[h = \mathrm{rel\_step} \operatorname{sign}(x) \max(1, |x|),\]

    and is adjusted if necessary to respect the bounds. For jac='3-point', the sign of h is ignored. If None, the step size is selected automatically. Default is None.

  • max_steplength (float, optional) – Maximum allowed step length during the line search. Default is 1e8.

  • ftol_linesearch (float, optional) –

    Tolerance for the sufficient-decrease condition in the line search.

    This is the Wolfe condition parameter \(c_1\) in

    \[f(x_k + \alpha_k p_k) \leq f(x_k) + c_1 \alpha_k p_k^\mathrm{T} \nabla f(x_k).\]

    Usually, \(c_1\) is small and satisfies \(0 < c_1 < 1\). In the original L-BFGS-B Fortran implementation, this parameter is hard-coded to 1e-3. Default is 1e-3.

  • gtol_linesearch (float, optional) –

    Tolerance for the curvature condition in the line search.

    This is the Wolfe condition parameter \(c_2\) in

    \[\left| p_k^\mathrm{T} \nabla f(x_k + \alpha_k p_k) \right| \leq c_2 \left| p_k^\mathrm{T} \nabla f(x_k) \right|.\]

    The parameters should satisfy \(0 < c_1 < c_2 < 1\). Usually, \(c_2\) is much larger than \(c_1\). In the original L-BFGS-B Fortran implementation, this parameter is hard-coded to 0.9. Default is 0.9.

  • xtol_linesearch (float, optional) – Relative tolerance for acceptable steps in the line-search procedure. In the original L-BFGS-B Fortran implementation, this parameter is hard-coded to 0.1. Default is 0.1.

  • eps_SY (float, optional) – Numerical threshold used when updating the L-BFGS correction matrices. Default is 2.2e-16.

  • iprint (int, optional) –

    Verbosity level.

    • iprint < 0: no output.

    • iprint == 0: print only the final summary.

    • 0 < iprint < 99: print objective value and projected-gradient norm

    every iprint iterations. - iprint >= 99: print detailed information at every iteration, except full vectors.

    Default is -1.

  • logger (logging.Logger, optional) – Logger used for textual output. If None, no output is emitted regardless of iprint. Default is None.

  • is_check_factorizations (bool, optional) – Whether to run additional consistency checks on matrix factorizations. Intended for development and debugging only. Default is False.

  • is_use_numba_jit (bool, optional) –

    Whether to use numba just-in-time compilation for performance-critical parts of the algorithm.

    If True but numba is unavailable, a warning is raised and the solver falls back to the pure Python implementation. The expected speed-up for large problems is typically between two and five times. Default is True.

    Added in version 1.0.

Returns:

Optimization result represented as an scipy.optimize.OptimizeResult.

Important fields include:

xndarray

Final solution.

funfloat

Objective value at the final solution.

jacndarray

Gradient at the final solution.

nfevint

Number of objective-function evaluations.

njevint

Number of gradient evaluations.

nitint

Number of iterations.

statusint

Solver status code.

messagestr

Termination message.

successbool

Whether the solver terminated successfully.

hess_invscipy.optimize.LbfgsInvHessProduct

Limited-memory inverse-Hessian approximation.

Return type:

scipy.optimize.OptimizeResult

Raises:

ValueError – Raised when the provided checkpoint is inconsistent with x0 or when restored correction vectors have incompatible dimensions.

Notes

This implementation is a Python reimplementation of the L-BFGS-B algorithm, rather than a wrapper around the original Fortran code. It exposes several parameters that are fixed in the historical implementation, including line-search Wolfe-condition parameters.

The convention jac=True follows scipy.optimize.minimize(): in that case, fun must return both the objective value and the gradient. For example:

def fun(x):
    f = x[0] ** 2 + x[1] ** 2
    g = np.array([2 * x[0], 2 * x[1]])
    return f, g

res = minimize_lbfgsb(x0=x0, fun=fun, jac=True)

If jac is None or False, finite differences are used.

Examples

Minimize a quadratic function with a separate gradient function:

import numpy as np
from lbfgsb import minimize_lbfgsb

def objective(x):
    return x[0] ** 2 + x[1] ** 2

def gradient(x):
    return np.array([2 * x[0], 2 * x[1]])

x0 = np.array([10.0, 10.0])

res = minimize_lbfgsb(
    x0=x0,
    fun=objective,
    jac=gradient,
)

print(res.x)
print(res.fun)

Use the SciPy-compatible jac=True convention:

import numpy as np
from lbfgsb import minimize_lbfgsb

def objective_and_gradient(x):
    f = x[0] ** 2 + x[1] ** 2
    g = np.array([2 * x[0], 2 * x[1]])
    return f, g

x0 = np.array([10.0, 10.0])

res = minimize_lbfgsb(
    x0=x0,
    fun=objective_and_gradient,
    jac=True,
)

print(res.x)
print(res.fun)

Use finite differences with bounds:

import numpy as np
from lbfgsb import minimize_lbfgsb

def objective(x):
    return x[0] ** 2 + x[1] ** 2

x0 = np.array([10.0, 10.0])
bounds = [(0.0, None), (0.0, None)]

res = minimize_lbfgsb(
    x0=x0,
    fun=objective,
    jac="2-point",
    bounds=bounds,
)

References