HillClimbingOptimizer

class HillClimbingOptimizer(search_space: dict[str, list], initialize: dict[Literal['grid', 'vertices', 'random', 'warm_start'], int | list[dict]] = None, constraints: list[callable] = None, random_state: int = None, rand_rest_p: float = 0, nth_process: int = None, epsilon: float = 0.03, distribution: Literal['normal', 'laplace', 'gumbel', 'logistic'] = 'normal', n_neighbours: int = 3)[source]

Local search optimizer that iteratively moves towards better neighboring solutions.

Hill Climbing is a simple yet effective local search algorithm that starts from an initial position and iteratively moves to neighboring positions that improve the objective function. At each step, the algorithm samples a set of neighboring positions and moves to the best one found. This greedy approach makes it fast and memory-efficient, but susceptible to getting stuck in local optima.

The algorithm is well-suited for:

  • Unimodal optimization problems with a single global optimum

  • Fine-tuning solutions found by other optimizers

  • Problems where function evaluations are expensive (due to low overhead)

  • Situations requiring a simple, interpretable optimization process

The epsilon parameter controls the step size: smaller values lead to finer local search but slower convergence, while larger values enable broader exploration but may overshoot optima. The n_neighbours parameter determines how many candidate positions are evaluated before making a move.

Parameters:
search_spacedict[str, list]

The search space to explore, defined as a dictionary mapping parameter names to arrays of possible values.

Each key is a parameter name (string), and each value is a numpy array or list of discrete values that the parameter can take. The optimizer will only evaluate positions that are on this discrete grid.

Example: A 2D search space with 100 points per dimension:

search_space = {
    "x": np.linspace(-10, 10, 100),
    "y": np.linspace(-10, 10, 100),
}

The resolution of each dimension (number of points in the array) directly affects optimization quality and speed. More points give finer resolution but increase the search space size exponentially.

initializedict[str, int], default={“vertices”: 4, “random”: 2}

Strategy for generating initial positions before the main optimization loop begins. Initialization samples are evaluated first, and the best one becomes the starting point for the optimizer.

Supported keys:

  • "grid": int – Number of positions on a regular grid.

  • "vertices": int – Number of corner/edge positions of the search space.

  • "random": int – Number of uniformly random positions.

  • "warm_start": list[dict] – Specific positions to evaluate, each as a dict mapping parameter names to values.

Multiple strategies can be combined:

initialize = {"vertices": 4, "random": 10}
initialize = {"warm_start": [{"x": 0.5, "y": 1.0}], "random": 5}

More initialization samples improve the starting point but consume iterations from n_iter. For expensive objectives, a few targeted warm-start points are often more efficient than many random samples.

constraintslist[callable], default=[]

A list of constraint functions that restrict the search space. Each constraint is a callable that receives a parameter dictionary and returns True if the position is valid, False if it should be rejected.

Rejected positions are discarded and regenerated: the optimizer resamples a new candidate position (up to 100 retries per step). During initialization, positions that violate constraints are filtered out entirely.

Example: Constrain the search to a circular region:

def circular_constraint(para):
    return para["x"]**2 + para["y"]**2 <= 25

constraints = [circular_constraint]

Multiple constraints are combined with AND logic (all must return True).

random_stateint or None, default=None

Seed for the random number generator to ensure reproducible results.

  • None: Use a new random state each run (non-deterministic).

  • int: Seed the random number generator for reproducibility.

Setting a fixed seed is recommended for debugging and benchmarking. Different seeds may lead to different optimization trajectories, especially for stochastic optimizers.

rand_rest_pfloat, default=0

Probability of performing a random restart instead of the normal algorithm step. At each iteration, a uniform random number is drawn; if it falls below rand_rest_p, the optimizer jumps to a random position instead of following its strategy.

  • 0.0: No random restarts (pure algorithm behavior).

  • 0.01-0.05: Light diversification, helps escape shallow local optima.

  • 0.1-0.3: Aggressive restarts, useful for highly multi-modal landscapes.

  • 1.0: Equivalent to random search.

This is especially useful for local search optimizers (Hill Climbing, Simulated Annealing) that can get trapped. For population-based optimizers, the effect is less pronounced since they already maintain diversity through multiple agents.

epsilonfloat, default=0.03

Step size for generating neighbor positions, expressed as a fraction of each dimension’s range. Controls how far the optimizer looks from the current position when sampling neighbors.

  • 0.01-0.02: Fine-grained local search, slow convergence.

  • 0.03-0.05: Moderate step size, good default range.

  • 0.1-0.3: Large steps, broader exploration.

  • 0.5-1.0: Very large steps, nearly global jumps.

Example: For a dimension with np.linspace(0, 100, 1000) (range = 100):

  • epsilon=0.03 leads to neighbors within ~3 units

  • epsilon=0.1 leads to neighbors within ~10 units

Smaller values are better for fine-tuning near a known good solution. Larger values help escape local optima but may overshoot narrow peaks.

distribution{“normal”, “laplace”, “gumbel”, “logistic”}, default=”normal”

Probability distribution used to sample neighbor offsets. Each distribution produces different exploration patterns:

  • "normal": Gaussian distribution. Most neighbors are close to the current position with rare far jumps. Best general-purpose choice.

  • "laplace": Sharper peak than normal with heavier tails. Good for landscapes where occasional large jumps help.

  • "gumbel": Asymmetric, skewed distribution. Can bias exploration in one direction.

  • "logistic": Similar to normal but with slightly heavier tails. A middle ground between normal and Laplace.

The distribution interacts with epsilon: heavy-tailed distributions (Laplace) effectively increase the chance of large steps beyond what epsilon alone suggests.

n_neighboursint, default=3

Number of neighbor positions to sample and evaluate per iteration. The optimizer moves to the best among these neighbors (if it improves on the current position).

  • 1: Minimal sampling, fast iterations but may miss good directions. Good for very cheap objective functions.

  • 3-5: Moderate sampling, good balance of quality and speed.

  • 10-20: Thorough neighborhood evaluation, better directional choices but more function evaluations per step.

Higher values act like a local beam search, giving the optimizer more information about the local landscape at each iteration.

Attributes:
best_para

Return the best parameters found as a dictionary.

best_value

Return the best values found (raw parameter values).

search_data

Lazily construct and return the search results DataFrame.

Methods

eval_time

init_stats

iter_time

search

See also

StochasticHillClimbingOptimizer

Probabilistic acceptance to escape local optima.

RepulsingHillClimbingOptimizer

Increases step size when stuck.

SimulatedAnnealingOptimizer

Temperature-based acceptance with cooling.

RandomRestartHillClimbingOptimizer

Periodically restarts from random positions.

Notes

The hill climbing update rule at each step:

\[\begin{split}x_{\\text{neighbor}} = x_{\\text{current}} + \\epsilon \\cdot \\text{range}(d) \\cdot \\text{sample}(\\text{distribution})\end{split}\]

The move is accepted only if it improves the objective:

\[\begin{split}f(x_{\\text{neighbor}}) > f(x_{\\text{current}})\end{split}\]

Time complexity per iteration is O(n_neighbours * d), where d is the number of dimensions.

For visual explanations, algorithm flowcharts, and tuning guides, see the Hill Climbing user guide.

Examples

>>> import numpy as np
>>> from gradient_free_optimizers import HillClimbingOptimizer
>>> def parabola(para):
...     return -(para["x"] ** 2 + para["y"] ** 2)
>>> search_space = {
...     "x": np.linspace(-10, 10, 100),
...     "y": np.linspace(-10, 10, 100),
... }
>>> opt = HillClimbingOptimizer(search_space)
>>> opt.search(parabola, n_iter=100)
property best_para[source]

Return the best parameters found as a dictionary.

Uses the Converter to transform the best position into user-friendly parameter names and values.

Returns:
dict or None

Dictionary mapping parameter names to their best values, or None if no evaluation has been performed yet.

property best_value[source]

Return the best values found (raw parameter values).

Returns:
list or None

List of best values in parameter order, or None if no evaluation has been performed yet.

property search_data: pd.DataFrame[source]

Lazily construct and return the search results DataFrame.

The DataFrame is only built when this property is accessed, avoiding a large memory spike at the end of high-dimensional optimizations. The result is cached so subsequent accesses don’t rebuild it.