Downhill Simplex

Downhill Simplex (Nelder-Mead) operates on a geometric structure rather than sampling from a neighborhood distribution. It maintains a simplex of n+1 vertices in n-dimensional space (a triangle in 2D, a tetrahedron in 3D) and transforms it through four operations: reflection of the worst vertex through the centroid, expansion along the reflection direction, contraction toward the centroid, and shrinking the entire simplex toward the best vertex. The simplex reshapes itself at each iteration based on the objective values at its vertices, adapting its orientation and size to the local curvature of the function.

Downhill Simplex on Sphere function

Convex function: The simplex contracts toward the minimum.

Downhill Simplex on Ackley function

Multi-modal function: May converge to local optimum depending on initial simplex placement.

Downhill Simplex is structurally different from the other local search algorithms in this library, which all generate candidates by sampling from a distribution around the current position. The geometric approach makes it effective on smooth, low-dimensional functions with curved valleys or ridges, where the simplex can align itself along the function’s contours. Its performance degrades as dimensionality increases because the simplex requires n+1 vertices and the geometric operations become less informative in high-dimensional spaces. Choose it for problems with fewer than roughly 10 continuous dimensions where the objective function is smooth and deterministic.

Algorithm

The algorithm maintains n+1 vertices for an n-dimensional problem. At each iteration:

  1. Order vertices by score (best to worst)

  2. Reflect the worst vertex through the centroid of the remaining vertices

  3. Depending on the reflection result:

    • If best so far: Expand further in that direction

    • If improvement: Accept the reflection

    • If still worst: Contract toward the centroid

    • If contraction fails: Shrink entire simplex toward best vertex

This geometric approach doesn’t require gradient information but adapts to the local curvature of the objective function.

Note

The Downhill Simplex is the only algorithm in GFO that uses geometric transformations rather than random sampling. The simplex shape adapts to the local landscape: it elongates along valleys, contracts near optima, and can flip over ridges through reflection. This makes it particularly effective for smooth, low-dimensional functions with curved valleys (like Rosenbrock’s function).

Simplex Operations

Given vertices: [best, ..., worst] and centroid C of [best, ..., second_worst]

Reflection:   R = C + alpha * (C - worst)      # Try opposite side
Expansion:    E = C + gamma * (R - C)          # Go further if R is good
Contraction:  K = C + beta * (worst - C)       # Pull back toward center
Shrink:       x_i = best + 0.5 * (x_i - best)  # Contract entire simplex

Parameters

Parameter

Type

Default

Description

alpha

float

1.0

Reflection coefficient (typically 1.0)

gamma

float

2.0

Expansion coefficient (typically 2.0)

beta

float

0.5

Contraction coefficient (typically 0.5)

Standard Coefficients

The default values (alpha=1.0, gamma=2.0, beta=0.5) are the original Nelder-Mead coefficients and work well for most problems. Adjust only if you have specific needs:

  • Larger alpha/gamma: More aggressive exploration

  • Smaller beta: More conservative contraction

  • 0.5 shrink factor: Standard (not configurable in GFO)

Example

import numpy as np
from gradient_free_optimizers import DownhillSimplexOptimizer

def rosenbrock(para):
    x, y = para["x"], para["y"]
    return -((1 - x)**2 + 100 * (y - x**2)**2)

search_space = {
    "x": np.linspace(-5, 5, 100),
    "y": np.linspace(-5, 5, 100),
}

opt = DownhillSimplexOptimizer(
    search_space,
    alpha=1.0,   # Standard reflection
    gamma=2.0,   # Standard expansion
    beta=0.5,    # Standard contraction
)

opt.search(rosenbrock, n_iter=500)
print(f"Best: {opt.best_para}, Score: {opt.best_score}")

When to Use

Good for:

  • Low-dimensional problems (< 10 dimensions)

  • Smooth objective functions

  • When you want a deterministic, geometry-based approach

  • Problems where function evaluations are cheap

Not ideal for:

  • High-dimensional spaces (simplex has n+1 vertices)

  • Noisy objective functions

  • Functions with many local optima

  • Discrete or categorical parameters

Comparison with Other Local Methods

vs. Hill Climbing

Simplex uses n+1 points vs. neighborhood sampling. Better for smooth functions, worse for discrete spaces.

vs. Powell’s Method

Simplex optimizes all dimensions jointly, Powell optimizes sequentially. Simplex is better for coupled parameters.

Initialization Note

The Downhill Simplex requires n+1 initial points. In GFO, these are generated from the initialization strategy (grid, random, vertices, or warm_start). Ensure you have at least n+1 initial positions:

n_dims = len(search_space)

opt = DownhillSimplexOptimizer(
    search_space,
    initialize={"random": n_dims + 1}  # Ensure enough initial points
)

3D Example with Warm Start

import numpy as np
from gradient_free_optimizers import DownhillSimplexOptimizer

def rosenbrock_3d(para):
    x, y, z = para["x"], para["y"], para["z"]
    return -(
        (1 - x)**2 + 100 * (y - x**2)**2
        + (1 - y)**2 + 100 * (z - y**2)**2
    )

search_space = {
    "x": np.linspace(-5, 5, 200),
    "y": np.linspace(-5, 5, 200),
    "z": np.linspace(-5, 5, 200),
}

opt = DownhillSimplexOptimizer(
    search_space,
    alpha=1.0,
    gamma=2.0,
    beta=0.5,
    initialize={"warm_start": [{"x": 0.5, "y": 0.5, "z": 0.5}],
                 "random": 3},
)

opt.search(rosenbrock_3d, n_iter=1000)
print(f"Best: {opt.best_para}")
print(f"Score: {opt.best_score}")

Trade-offs

  • Exploration vs. exploitation: Downhill Simplex is primarily exploitative. The simplex explores locally through its geometric operations but has no mechanism for large jumps to unexplored regions.

  • Computational overhead: Requires n+1 points per dimension, which means overhead grows linearly with dimensionality.

  • Parameter sensitivity: The standard coefficients (alpha=1, gamma=2, beta=0.5) work well for most problems. Rarely needs tuning.