CMAESOptimizer
- class CMAESOptimizer(search_space: dict[str, list], initialize: dict[Literal['grid', 'vertices', 'random', 'warm_start'], int | list[dict]] = None, constraints: list[callable] = None, random_state: int = None, rand_rest_p: float = 0, nth_process: int = None, boundary: str = 'clip', population: int = None, mu: int = None, sigma: float = 0.3, ipop_restart: bool = False)[source]
Evolutionary optimizer using covariance matrix adaptation.
CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is a state-of-the-art evolutionary algorithm for difficult continuous optimization problems. It adapts a full covariance matrix to learn the correlation structure of the fitness landscape, enabling efficient search even when parameters are strongly correlated or have different sensitivities.
The algorithm maintains a multivariate normal distribution and iteratively:
Samples
populationcandidate solutions from the distributionEvaluates and ranks them by fitness
Updates the distribution mean toward the best solutions
Adapts the covariance matrix using evolution paths
Controls the global step size via cumulative step-size adaptation
CMA-ES is considered the gold standard for continuous black-box optimization. For mixed search spaces (discrete, categorical), this implementation samples in continuous space and rounds to the nearest valid value, which is a pragmatic compromise.
The algorithm is well-suited for:
Continuous optimization with correlated parameters
Problems where parameter sensitivities differ strongly
Moderate dimensionality (up to ~100 dimensions)
Multi-modal landscapes (especially with IPOP restart)
- Parameters:
- search_spacedict[str, list]
The search space to explore, defined as a dictionary mapping parameter names to arrays of possible values.
Each key is a parameter name (string), and each value is a numpy array or list of discrete values that the parameter can take. The optimizer will only evaluate positions that are on this discrete grid.
Example: A 2D search space with 100 points per dimension:
search_space = { "x": np.linspace(-10, 10, 100), "y": np.linspace(-10, 10, 100), }
The resolution of each dimension (number of points in the array) directly affects optimization quality and speed. More points give finer resolution but increase the search space size exponentially.
- initializedict[str, int], default={“vertices”: 4, “random”: 2}
Strategy for generating initial positions before the main optimization loop begins. Initialization samples are evaluated first, and the best one becomes the starting point (mean) for the CMA-ES distribution.
Supported keys:
"grid":int– Number of positions on a regular grid."vertices":int– Number of corner/edge positions of the search space."random":int– Number of uniformly random positions."warm_start":list[dict]– Specific positions to evaluate, each as a dict mapping parameter names to values.
Multiple strategies can be combined:
initialize = {"vertices": 4, "random": 10} initialize = {"warm_start": [{"x": 0.5, "y": 1.0}], "random": 5}
More initialization samples improve the starting point but consume iterations from
n_iter. For expensive objectives, a few targeted warm-start points are often more efficient than many random samples.- constraintslist[callable], default=[]
A list of constraint functions that restrict the search space. Each constraint is a callable that receives a parameter dictionary and returns
Trueif the position is valid,Falseif it should be rejected.Rejected positions are discarded and regenerated: the optimizer resamples a new candidate position (up to 100 retries per step). During initialization, positions that violate constraints are filtered out entirely.
Example: Constrain the search to a circular region:
def circular_constraint(para): return para["x"]**2 + para["y"]**2 <= 25 constraints = [circular_constraint]
Multiple constraints are combined with AND logic (all must return
True).- random_stateint or None, default=None
Seed for the random number generator to ensure reproducible results.
None: Use a new random state each run (non-deterministic).int: Seed the random number generator for reproducibility.
Setting a fixed seed is recommended for debugging and benchmarking. Different seeds may lead to different optimization trajectories, especially for stochastic optimizers.
- rand_rest_pfloat, default=0
Probability of performing a random restart instead of the normal algorithm step. At each iteration, a uniform random number is drawn; if it falls below
rand_rest_p, the optimizer jumps to a random position instead of following its strategy.0.0: No random restarts (pure algorithm behavior).0.01-0.05: Light diversification, helps escape shallow local optima.0.1-0.3: Aggressive restarts, useful for highly multi-modal landscapes.1.0: Equivalent to random search.
- boundary{“clip”, “reflect”, “periodic”, “random”, “intermediate”}, default=”clip”
Strategy for handling positions that exceed the search space bounds. When the optimizer proposes a candidate outside the valid range, this parameter controls how that candidate is mapped back.
"clip": Clamp each coordinate to the nearest bound."reflect": Mirror the position back into the search space at the boundary it crossed."periodic": Wrap around to the opposite end of the range, treating the search space as periodic."random": Replace the out-of-bounds position with a uniformly random position within the valid range."intermediate": Move to the midpoint between the current position and the violated bound.
Applies to continuous and discrete numerical dimensions. Categorical dimensions always snap to the nearest valid category index.
- populationint or None, default=None
Number of candidate solutions sampled per generation (lambda in CMA-ES notation). If
None, uses the standard heuristic:4 + floor(3 * ln(n_dimensions)).None: Auto-compute based on dimensionality (recommended).10-20: Small populations for fast convergence on simple problems.50-100: Large populations for better exploration on multimodal or high-dimensional problems.
Each generation requires
populationfunction evaluations, so total cost per generation scales linearly with this parameter.- muint or None, default=None
Number of best solutions selected as parents for the next generation. If
None, usespopulation // 2.None: Auto-compute as half the population (recommended).Smaller
mu: Stronger selection pressure, faster convergence but higher risk of premature convergence.Larger
mu: Weaker selection pressure, better exploration.
Must be less than or equal to
population.- sigmafloat, default=0.3
Initial step size as a fraction of the normalized search space range. Controls the initial spread of sampled solutions around the mean.
0.1: Conservative, tight initial sampling.0.3: Standard starting point (default).0.5: Broad initial exploration.
CMA-ES adapts sigma automatically during optimization, so the initial value is not critical. Values between 0.1 and 0.5 generally work well.
- ipop_restartbool, default=False
Enable IPOP (Increasing Population) restart strategy. When stagnation is detected (no improvement for many generations), the algorithm restarts with a doubled population size and a random starting point.
False: No restarts, single run (default).True: Enable IPOP restarts for better global search on multimodal landscapes.
IPOP-CMA-ES is particularly effective for problems with many local optima, as it combines the precision of CMA-ES with increasingly thorough global search.
- Attributes:
best_paraReturn the best parameters found as a dictionary.
best_scoreBest score found during the search.
best_valueReturn the best values found (raw parameter values).
diagnosticsDiagnostics accessor for the last search.
search_dataLazily construct and return the search results DataFrame.
Methods
search(objective_function, n_iter[, ...])Run the optimization loop.
eval_time
init_stats
iter_time
See also
EvolutionStrategyOptimizerSimpler ES with self-adaptive sigma.
DifferentialEvolutionOptimizerDE using vector differences.
ParticleSwarmOptimizerSwarm intelligence approach.
Notes
CMA-ES adapts the search distribution using two evolution paths:
Cumulation path for sigma (p_sigma): Controls global step size via Cumulative Step-size Adaptation (CSA). If steps are correlated (consistent direction), sigma increases; if anti-correlated (oscillating), sigma decreases.
Cumulation path for C (p_c): Provides the rank-one update to the covariance matrix, capturing the dominant search direction.
The covariance matrix is updated via:
Rank-one update: Uses p_c to learn the principal search direction.
Rank-mu update: Uses all mu selected solutions to learn the local landscape shape.
For mixed search spaces (discrete/categorical dimensions), the algorithm operates in a normalized continuous space and maps back to valid values via rounding. This is a standard approach (MI-CMA-ES) that preserves the covariance adaptation while supporting non-continuous parameters.
Examples
>>> import numpy as np >>> from gradient_free_optimizers import CMAESOptimizer
>>> def rosenbrock(para): ... x, y = para["x"], para["y"] ... return -(100 * (y - x**2)**2 + (1 - x)**2)
>>> search_space = { ... "x": np.linspace(-5, 5, 1000), ... "y": np.linspace(-5, 5, 1000), ... }
>>> opt = CMAESOptimizer(search_space, population=20, sigma=0.3) >>> opt.search(rosenbrock, n_iter=500)
- search(objective_function: Callable[[dict[str, Any]], float], n_iter: int, max_time: float | None = None, max_score: float | None = None, early_stopping: dict[str, Any] | None = None, memory: bool | BaseStorage = True, memory_warm_start: pd.DataFrame | None = None, verbosity: list[str] | Literal[False] = ['progress_bar', 'print_results', 'print_times'], optimum: Literal['maximum', 'minimum'] = 'maximum', callbacks: list[Callable[[CallbackInfo], bool | None]] | None = None, catch: dict[type[Exception], int | float] | None = None) None[source]
Run the optimization loop.
Evaluates
objective_functionup ton_itertimes, searching for the parameters that maximize (or minimize) the returned score. The search proceeds in two phases: an initialization phase that evaluates starting positions (controlled by theinitializeconstructor parameter), followed by an iteration phase where the optimizer’s strategy generates new candidate positions.After the search finishes, results are available via
best_para,best_score, andsearch_data.- Parameters:
- objective_functioncallable
The function to optimize. Must accept a single dictionary mapping parameter names to values and return either:
A
floatscore, orA tuple
(float, dict)where the second element contains custom metrics (accessible via callbacks andsearch_data).
Example:
def objective(params): return -(params["x"] ** 2 + params["y"] ** 2) def objective_with_metrics(params): loss = params["x"] ** 2 return -loss, {"loss": loss}
- n_iterint
Total number of iterations (including initialization). Each iteration evaluates the objective function once (unless a cached result is found when
memory=True).- max_timefloat or None, default=None
Maximum wall-clock time in seconds. The search stops after the current iteration if the elapsed time exceeds this limit.
Nonemeans no time limit.- max_scorefloat or None, default=None
Target score threshold. The search stops when the best score found so far reaches or exceeds this value. When
optimum="minimum", this refers to the original (non-negated) score.Nonemeans no score limit.- early_stoppingdict or None, default=None
Configuration for stopping the search when progress stalls.
Nonedisables early stopping. When provided, the dictionary supports the following keys:"n_iter_no_change"(int, required): Stop if no improvement is observed for this many consecutive iterations."tol_abs"(float, optional): Minimum absolute improvement required over the window to count as progress."tol_rel"(float, optional): Minimum relative improvement (in percent) required over the window to count as progress.
Example:
early_stopping = {"n_iter_no_change": 50} early_stopping = {"n_iter_no_change": 30, "tol_abs": 0.001}
- memorybool or BaseStorage, default=True
Controls evaluation caching. When
True, uses an in-memory dictionary (equivalent toMemoryStorage()). WhenFalse, disables caching entirely. ABaseStorageinstance enables custom storage backends:from gradient_free_optimizers._storage import SQLiteStorage opt.search(objective, memory=SQLiteStorage("results.db"))
SQLiteStoragepersists results to disk, enabling crash recovery and cache reuse across runs. Works with distributed evaluation (positions are checked against the cache before being dispatched to workers).In-memory caching is especially useful for discrete search spaces where revisits are common.
- memory_warm_startpd.DataFrame or None, default=None
A DataFrame from a previous search (typically obtained via
search_data) to pre-populate the evaluation cache. The DataFrame must contain columns matching the search space parameter names plus a"score"column. Requiresmemory=True.Example:
opt1 = HillClimbingOptimizer(search_space) opt1.search(objective, n_iter=50) opt2 = HillClimbingOptimizer(search_space) opt2.search(objective, n_iter=50, memory_warm_start=opt1.search_data)
- verbositylist[str] or False, default=[“progress_bar”, “print_results”, “print_times”]
Controls console output during and after the search. Pass
Falseor an empty list for silent operation.Supported values:
"progress_bar": Show a livetqdmprogress bar during the search."print_results": Print best score and best parameters after the search completes."print_times": Print timing breakdown (evaluation time, optimization overhead, throughput) after the search completes."print_search_stats": Print search statistics including iteration counts, acceptance rate, number of improvements, and longest plateau."print_statistics": Print score statistics (min, max, mean, std) after the search completes."debug_stop": Print detailed stopping condition debug info when the search terminates early.
- optimum{“maximum”, “minimum”}, default=”maximum”
Whether to maximize or minimize the objective function. When set to
"minimum", the objective function’s return value is negated internally so that the optimizer always maximizes. The reportedbest_scoreis in original (non-negated) units.- callbackslist[callable] or None, default=None
A list of callback functions invoked after each iteration. Each callback receives a single argument
infowith the following attributes:info.iteration(int): Current iteration index (0-based).info.score(float): Score from the current evaluation.info.params(dict): Parameters evaluated in this iteration.info.best_score(float): Best score found so far.info.best_para(dict): Parameters of the best score.info.n_iter(int): Total iterations planned.info.phase(str):"init"or"iter".info.elapsed_time(float): Seconds since search started.info.metrics(dict): Custom metrics from the objective function (empty if the objective returns only a score).info.convergence(list[float]): Best score at each iteration so far.
If any callback returns
False, the search stops immediately. Any other return value (includingNone) is ignored and the search continues.Example:
def log_progress(info): if info.iteration % 10 == 0: print(f"Iter {info.iteration}: best={info.best_score:.4f}") def stop_early(info): if info.best_score > 0.99: return False # stops the search opt.search(objective, n_iter=100, callbacks=[log_progress, stop_early])
- catchdict[type, float] or None, default=None
Error handling for the objective function. Maps exception types to fallback scores. When the objective function raises a caught exception, the optimizer records the fallback score instead of crashing. Exception subclasses are matched via
isinstance, so{Exception: ...}catches all.The fallback score is in the user’s original units (before any negation from
optimum="minimum"). Usefloat('nan')orfloat('inf')to mark positions as invalid without inventing an artificial score.Example:
catch = {ValueError: -1000, RuntimeError: float('nan')} opt.search(objective, n_iter=100, catch=catch)
Examples
Basic usage with default settings:
>>> import numpy as np >>> from gradient_free_optimizers import HillClimbingOptimizer >>> def objective(para): ... return -(para["x"] ** 2) >>> search_space = {"x": np.linspace(-10, 10, 100)} >>> opt = HillClimbingOptimizer(search_space) >>> opt.search(objective, n_iter=30)
Using multiple stopping conditions:
>>> opt.search( ... objective, ... n_iter=1000, ... max_time=60, ... max_score=-0.01, ... early_stopping={"n_iter_no_change": 50}, ... )
- property best_para[source]
Return the best parameters found as a dictionary.
Resolution order: 1. Explicitly set
_best_para(used by the ask/tell mixin). 2. SearchTracker-derived value whensearch()has populated it. 3. Fallback computed from_pos_best.The tracker path is preferred because
_pos_bestcan lag behind the true best for some optimizers that only update it on accepted moves. In v2 the fallback path is slated for removal; the tracker becomes the single source of truth.
- property best_score: float[source]
Best score found during the search.
Reads from the internal
SearchTracker(the single source of truth) via the privateself._dataaccessor.
- property best_value[source]
Return the best values found (raw parameter values).
- Returns:
- list or None
List of best values in parameter order, or None if no evaluation has been performed yet.
- property diagnostics[source]
Diagnostics accessor for the last search.
Returns an accessor whose methods run the diagnostics in
gradient_free_optimizers.diagnosticson this optimizer’ssearch_data.For saved runs or cross-run analysis, prefer the free functions in
gradient_free_optimizers.diagnosticswhich accept any list-of-dicts, pandas/polars DataFrame, or dict-of-sequences.- Raises:
- AttributeError
If
search()has not been called yet.
- property search_data: pd.DataFrame[source]
Lazily construct and return the search results DataFrame.
The DataFrame is only built when this property is accessed, avoiding a large memory spike at the end of high-dimensional optimizations. The result is cached so subsequent accesses don’t rebuild it.