SimulatedAnnealingOptimizer

class SimulatedAnnealingOptimizer(search_space: dict[str, list], initialize: dict[Literal['grid', 'vertices', 'random', 'warm_start'], int | list[dict]] = None, constraints: list[callable] = None, random_state: int = None, rand_rest_p: float = 0, nth_process: int = None, boundary: str = 'clip', epsilon: float = 0.03, distribution: Literal['normal', 'laplace', 'gumbel', 'logistic'] = 'normal', n_neighbours: int = 3, annealing_rate: float = 0.97, start_temp: float = 1)[source]

Probabilistic optimizer inspired by the annealing process in metallurgy.

Simulated Annealing is a classic metaheuristic that mimics the physical process of heating and slowly cooling a material to reduce defects. The algorithm starts with a high “temperature” that allows accepting worse solutions with high probability, enabling broad exploration. As the temperature decreases according to the annealing schedule, the acceptance probability for worse solutions decreases, and the algorithm gradually focuses on exploitation.

The acceptance probability follows the Metropolis criterion: worse solutions are accepted with probability exp(-delta/T), where delta is the score difference and T is the current temperature. This allows the algorithm to escape local optima early in the search while converging to good solutions later.

The algorithm is well-suited for:

Combinatorial optimization problems
Multimodal functions with many local optima
Problems where a good balance of exploration and exploitation is needed
Situations where solution quality matters more than speed

The annealing_rate controls how fast the temperature decreases. Values close to 1.0 cool slowly (more exploration), while smaller values cool faster (quicker convergence but risk of premature convergence).

Parameters:

search_spacedict[str, list]

The search space to explore, defined as a dictionary mapping parameter names to arrays of possible values.

Each key is a parameter name (string), and each value is a numpy array or list of discrete values that the parameter can take. The optimizer will only evaluate positions that are on this discrete grid.

Example: A 2D search space with 100 points per dimension:

search_space = {
    "x": np.linspace(-10, 10, 100),
    "y": np.linspace(-10, 10, 100),
}

The resolution of each dimension (number of points in the array) directly affects optimization quality and speed. More points give finer resolution but increase the search space size exponentially.

initializedict[str, int], default={“vertices”: 4, “random”: 2}

Strategy for generating initial positions before the main optimization loop begins. Initialization samples are evaluated first, and the best one becomes the starting point for the optimizer.

Supported keys:

"grid": int – Number of positions on a regular grid.
"vertices": int – Number of corner/edge positions of the search space.
"random": int – Number of uniformly random positions.
"warm_start": list[dict] – Specific positions to evaluate, each as a dict mapping parameter names to values.

Multiple strategies can be combined:

initialize = {"vertices": 4, "random": 10}
initialize = {"warm_start": [{"x": 0.5, "y": 1.0}], "random": 5}

More initialization samples improve the starting point but consume iterations from n_iter. For expensive objectives, a few targeted warm-start points are often more efficient than many random samples.

constraintslist[callable], default=[]

A list of constraint functions that restrict the search space. Each constraint is a callable that receives a parameter dictionary and returns True if the position is valid, False if it should be rejected.

Rejected positions are discarded and regenerated: the optimizer resamples a new candidate position (up to 100 retries per step). During initialization, positions that violate constraints are filtered out entirely.

Example: Constrain the search to a circular region:

def circular_constraint(para):
    return para["x"]**2 + para["y"]**2 <= 25

constraints = [circular_constraint]

Multiple constraints are combined with AND logic (all must return True).

random_stateint or None, default=None

Seed for the random number generator to ensure reproducible results.

None: Use a new random state each run (non-deterministic).
int: Seed the random number generator for reproducibility.

Setting a fixed seed is recommended for debugging and benchmarking. Different seeds may lead to different optimization trajectories, especially for stochastic optimizers.

rand_rest_pfloat, default=0

Probability of performing a random restart instead of the normal algorithm step. At each iteration, a uniform random number is drawn; if it falls below rand_rest_p, the optimizer jumps to a random position instead of following its strategy.

0.0: No random restarts (pure algorithm behavior).
0.01-0.05: Light diversification, helps escape shallow local optima.
0.1-0.3: Aggressive restarts, useful for highly multi-modal landscapes.
1.0: Equivalent to random search.

This is especially useful for local search optimizers (Hill Climbing, Simulated Annealing) that can get trapped. For population-based optimizers, the effect is less pronounced since they already maintain diversity through multiple agents.

boundary{“clip”, “reflect”, “periodic”, “random”, “intermediate”}, default=”clip”

Strategy for handling positions that exceed the search space bounds. When the optimizer proposes a candidate outside the valid range, this parameter controls how that candidate is mapped back.

"clip": Clamp each coordinate to the nearest bound.
"reflect": Mirror the position back into the search space at the boundary it crossed.
"periodic": Wrap around to the opposite end of the range, treating the search space as periodic.
"random": Replace the out-of-bounds position with a uniformly random position within the valid range.
"intermediate": Move to the midpoint between the current position and the violated bound.

Applies to continuous and discrete numerical dimensions. Categorical dimensions always snap to the nearest valid category index.

epsilonfloat, default=0.03

Step size for generating neighbor positions, expressed as a fraction of each dimension’s range. Controls how far the optimizer looks from the current position when sampling neighbors.

0.01-0.02: Fine-grained local search, slow convergence.
0.03-0.05: Moderate step size, good default range.
0.1-0.3: Large steps, broader exploration.
0.5-1.0: Very large steps, nearly global jumps.

Example: For a dimension with np.linspace(0, 100, 1000) (range = 100):

epsilon=0.03 leads to neighbors within ~3 units
epsilon=0.1 leads to neighbors within ~10 units

Smaller values are better for fine-tuning near a known good solution. Larger values help escape local optima but may overshoot narrow peaks.

distribution{“normal”, “laplace”, “gumbel”, “logistic”}, default=”normal”

Probability distribution used to sample neighbor offsets. Each distribution produces different exploration patterns:

"normal": Gaussian distribution. Most neighbors are close to the current position with rare far jumps. Best general-purpose choice.
"laplace": Sharper peak than normal with heavier tails. Good for landscapes where occasional large jumps help.
"gumbel": Asymmetric, skewed distribution. Can bias exploration in one direction.
"logistic": Similar to normal but with slightly heavier tails. A middle ground between normal and Laplace.

The distribution interacts with epsilon: heavy-tailed distributions (Laplace) effectively increase the chance of large steps beyond what epsilon alone suggests.

n_neighboursint, default=3

Number of neighbor positions to sample and evaluate per iteration. The optimizer moves to the best among these neighbors (if it improves on the current position).

1: Minimal sampling, fast iterations but may miss good directions. Good for very cheap objective functions.
3-5: Moderate sampling, good balance of quality and speed.
10-20: Thorough neighborhood evaluation, better directional choices but more function evaluations per step.

Higher values act like a local beam search, giving the optimizer more information about the local landscape at each iteration.

annealing_ratefloat, default=0.97

Multiplicative cooling factor applied to the temperature at each iteration. The temperature at iteration t is start_temp * annealing_rate^t.

0.8-0.9: Fast cooling, quick convergence but limited exploration. Good for unimodal problems.
0.95-0.99: Slow cooling, thorough exploration before convergence. Better for multimodal landscapes.
0.999: Very slow cooling, extensive exploration. Use with high n_iter.

Example: With start_temp=1 and annealing_rate=0.97, the temperature after 100 iterations is approximately 0.048, and after 200 iterations approximately 0.002.

start_tempfloat, default=1

Initial temperature controlling the acceptance probability of worse solutions at the start of the search. Higher temperatures allow more exploration initially.

0.1-0.5: Low initial temperature, conservative exploration.
1.0: Moderate starting temperature (default).
5.0-20.0: High temperature, strong initial exploration phase.

The temperature interacts with the score scale of your objective function. If scores span a large range, higher start_temp may be needed to enable meaningful exploration.

Attributes:

best_para: Return the best parameters found as a dictionary.
best_score: Best score found during the search.
best_value: Return the best values found (raw parameter values).
diagnostics: Diagnostics accessor for the last search.
search_data: Lazily construct and return the search results DataFrame.

Methods

search(objective_function, n_iter[, ...])

Run the optimization loop.

eval_time
init_stats
iter_time

See also

StochasticHillClimbingOptimizer: Constant-probability acceptance.
ParallelTemperingOptimizer: Multiple annealers at different temperatures.
RandomAnnealingOptimizer: Temperature-based step size control.

Notes

The acceptance decision follows the Metropolis criterion:

\[\begin{split}P(\\text{accept}) = \\begin{cases} 1 & \\text{if } \\Delta f > 0 \\\\ \\exp(\\Delta f / T) & \\text{if } \\Delta f \\leq 0 \\end{cases}\end{split}\]

where \(\\Delta f = f(x_{\\text{new}}) - f(x_{\\text{current}})\) and \(T = T_0 \\cdot r^t\) is the temperature at iteration t, with \(T_0\) = start_temp and \(r\) = annealing_rate.

At high temperatures, almost all moves are accepted (exploration). As temperature decreases, only improving moves are accepted (exploitation). This provides a natural transition from global to local search.

For visual explanations and tuning guides, see the Simulated Annealing user guide.

Examples

>>> import numpy as np
>>> from gradient_free_optimizers import SimulatedAnnealingOptimizer

>>> def rosenbrock(para):
...     x, y = para["x"], para["y"]
...     return -((1 - x) ** 2 + 100 * (y - x ** 2) ** 2)

>>> search_space = {
...     "x": np.linspace(-2, 2, 100),
...     "y": np.linspace(-1, 3, 100),
... }

>>> opt = SimulatedAnnealingOptimizer(
...     search_space, annealing_rate=0.98, start_temp=10
... )
>>> opt.search(rosenbrock, n_iter=1000)

search(objective_function: Callable[[dict[str, Any]], float], n_iter: int, max_time: float | None = None, max_score: float | None = None, early_stopping: dict[str, Any] | None = None, memory: bool | BaseStorage = True, memory_warm_start: pd.DataFrame | None = None, verbosity: list[str] | Literal[False] = ['progress_bar', 'print_results', 'print_times'], optimum: Literal['maximum', 'minimum'] = 'maximum', callbacks: list[Callable[[CallbackInfo], bool | None]] | None = None, catch: dict[type[Exception], int | float] | None = None) → None[source]

Run the optimization loop.

Evaluates objective_function up to n_iter times, searching for the parameters that maximize (or minimize) the returned score. The search proceeds in two phases: an initialization phase that evaluates starting positions (controlled by the initialize constructor parameter), followed by an iteration phase where the optimizer’s strategy generates new candidate positions.

After the search finishes, results are available via best_para, best_score, and search_data.

Parameters:

objective_functioncallable

The function to optimize. Must accept a single dictionary mapping parameter names to values and return either:

A float score, or
A tuple (float, dict) where the second element contains custom metrics (accessible via callbacks and search_data).

Example:

def objective(params):
    return -(params["x"] ** 2 + params["y"] ** 2)

def objective_with_metrics(params):
    loss = params["x"] ** 2
    return -loss, {"loss": loss}

n_iterint

Total number of iterations (including initialization). Each iteration evaluates the objective function once (unless a cached result is found when memory=True).

max_timefloat or None, default=None

Maximum wall-clock time in seconds. The search stops after the current iteration if the elapsed time exceeds this limit. None means no time limit.

max_scorefloat or None, default=None

Target score threshold. The search stops when the best score found so far reaches or exceeds this value. When optimum="minimum", this refers to the original (non-negated) score. None means no score limit.

early_stoppingdict or None, default=None

Configuration for stopping the search when progress stalls. None disables early stopping. When provided, the dictionary supports the following keys:

"n_iter_no_change" (int, required): Stop if no improvement is observed for this many consecutive iterations.
"tol_abs" (float, optional): Minimum absolute improvement required over the window to count as progress.
"tol_rel" (float, optional): Minimum relative improvement (in percent) required over the window to count as progress.

Example:

early_stopping = {"n_iter_no_change": 50}
early_stopping = {"n_iter_no_change": 30, "tol_abs": 0.001}

memorybool or BaseStorage, default=True

Controls evaluation caching. When True, uses an in-memory dictionary (equivalent to MemoryStorage()). When False, disables caching entirely. A BaseStorage instance enables custom storage backends:

from gradient_free_optimizers._storage import SQLiteStorage
opt.search(objective, memory=SQLiteStorage("results.db"))

SQLiteStorage persists results to disk, enabling crash recovery and cache reuse across runs. Works with distributed evaluation (positions are checked against the cache before being dispatched to workers).

In-memory caching is especially useful for discrete search spaces where revisits are common.

memory_warm_startpd.DataFrame or None, default=None

A DataFrame from a previous search (typically obtained via search_data) to pre-populate the evaluation cache. The DataFrame must contain columns matching the search space parameter names plus a "score" column. Requires memory=True.

Example:

opt1 = HillClimbingOptimizer(search_space)
opt1.search(objective, n_iter=50)

opt2 = HillClimbingOptimizer(search_space)
opt2.search(objective, n_iter=50,
            memory_warm_start=opt1.search_data)

verbositylist[str] or False, default=[“progress_bar”, “print_results”, “print_times”]

Controls console output during and after the search. Pass False or an empty list for silent operation.

Supported values:

"progress_bar": Show a live tqdm progress bar during the search.
"print_results": Print best score and best parameters after the search completes.
"print_times": Print timing breakdown (evaluation time, optimization overhead, throughput) after the search completes.
"print_search_stats": Print search statistics including iteration counts, acceptance rate, number of improvements, and longest plateau.
"print_statistics": Print score statistics (min, max, mean, std) after the search completes.
"debug_stop": Print detailed stopping condition debug info when the search terminates early.

optimum{“maximum”, “minimum”}, default=”maximum”

Whether to maximize or minimize the objective function. When set to "minimum", the objective function’s return value is negated internally so that the optimizer always maximizes. The reported best_score is in original (non-negated) units.

callbackslist[callable] or None, default=None

A list of callback functions invoked after each iteration. Each callback receives a single argument info with the following attributes:

info.iteration (int): Current iteration index (0-based).
info.score (float): Score from the current evaluation.
info.params (dict): Parameters evaluated in this iteration.
info.best_score (float): Best score found so far.
info.best_para (dict): Parameters of the best score.
info.n_iter (int): Total iterations planned.
info.phase (str): "init" or "iter".
info.elapsed_time (float): Seconds since search started.
info.metrics (dict): Custom metrics from the objective function (empty if the objective returns only a score).
info.convergence (list[float]): Best score at each iteration so far.

If any callback returns False, the search stops immediately. Any other return value (including None) is ignored and the search continues.

Example:

def log_progress(info):
    if info.iteration % 10 == 0:
        print(f"Iter {info.iteration}: best={info.best_score:.4f}")

def stop_early(info):
    if info.best_score > 0.99:
        return False  # stops the search

opt.search(objective, n_iter=100,
           callbacks=[log_progress, stop_early])

catchdict[type, float] or None, default=None

Error handling for the objective function. Maps exception types to fallback scores. When the objective function raises a caught exception, the optimizer records the fallback score instead of crashing. Exception subclasses are matched via isinstance, so {Exception: ...} catches all.

The fallback score is in the user’s original units (before any negation from optimum="minimum"). Use float('nan') or float('inf') to mark positions as invalid without inventing an artificial score.

Example:

catch = {ValueError: -1000, RuntimeError: float('nan')}

opt.search(objective, n_iter=100, catch=catch)

Examples

Basic usage with default settings:

>>> import numpy as np
>>> from gradient_free_optimizers import HillClimbingOptimizer
>>> def objective(para):
...     return -(para["x"] ** 2)
>>> search_space = {"x": np.linspace(-10, 10, 100)}
>>> opt = HillClimbingOptimizer(search_space)
>>> opt.search(objective, n_iter=30)

Using multiple stopping conditions:

>>> opt.search(
...     objective,
...     n_iter=1000,
...     max_time=60,
...     max_score=-0.01,
...     early_stopping={"n_iter_no_change": 50},
... )

property best_para[source]

Return the best parameters found as a dictionary.

Resolution order: 1. Explicitly set _best_para (used by the ask/tell mixin). 2. SearchTracker-derived value when search() has populated it. 3. Fallback computed from _pos_best.

The tracker path is preferred because _pos_best can lag behind the true best for some optimizers that only update it on accepted moves. In v2 the fallback path is slated for removal; the tracker becomes the single source of truth.

property best_score: float[source]

Best score found during the search.

Reads from the internal SearchTracker (the single source of truth) via the private self._data accessor.

property best_value[source]

Return the best values found (raw parameter values).

Returns:

list or None: List of best values in parameter order, or None if no evaluation has been performed yet.

property diagnostics[source]

Diagnostics accessor for the last search.

Returns an accessor whose methods run the diagnostics in gradient_free_optimizers.diagnostics on this optimizer’s search_data.

For saved runs or cross-run analysis, prefer the free functions in gradient_free_optimizers.diagnostics which accept any list-of-dicts, pandas/polars DataFrame, or dict-of-sequences.

Raises:

AttributeError: If search() has not been called yet.

property search_data: pd.DataFrame[source]

Lazily construct and return the search results DataFrame.

The DataFrame is only built when this property is accessed, avoiding a large memory spike at the end of high-dimensional optimizations. The result is cached so subsequent accesses don’t rebuild it.