Differential Evolution

Differential Evolution (DE) generates new candidate solutions by adding the weighted difference between two randomly selected population members to a third member. This produces a mutant vector whose step size and direction are derived from the current population distribution. The mutant is then mixed with the original target vector through a crossover operation, and the result replaces the target only if it achieves a better score. Because the difference vectors are computed from the population itself, the step sizes adapt automatically: they are large when the population is spread out and shrink as it converges.

DE on Sphere function

Convex function: Population quickly converges.

DE on Ackley function

Multi-modal function: Difference vectors help escape local optima.

The self-adaptive step size is the key property that distinguishes DE from the other population-based optimizers in this library. PSO requires explicit velocity weights, ES requires a tuned mutation rate, and GA relies on a fixed crossover operator. DE derives both step direction and magnitude from the population, which makes it less sensitive to parameter settings. The default configuration (F=0.8, CR=0.9) works across a broad range of continuous problems without adjustment. DE is the preferred choice for real-valued optimization in this library when the problem has moderate to high dimensionality. It requires a minimum population size of 4 to form the necessary difference vectors.

Algorithm

For each target vector in the population:

  1. Mutation: Create mutant vector from three random population members

    mutant = x_r1 + F * (x_r2 - x_r3)
    

    where F is the mutation factor (mutation_rate)

  2. Crossover: Mix mutant with target vector

    trial[i] = mutant[i] if random() < CR else target[i]
    

    where CR is the crossover rate

  3. Selection: Keep trial if better than target

The Key Insight

The difference vector (x_r2 - x_r3) automatically adapts:

  • Early in search: population is spread out, large differences

  • Late in search: population converges, small differences

This provides self-adaptive step sizes without explicit parameters.

Parameters

Parameter

Type

Default

Description

population

int

10

Population size

mutation_rate

float

0.9

Mutation factor F (typically 0.5-1.0)

crossover_rate

float

0.9

Crossover probability CR (typically 0.5-1.0)

Tuning Tips

Mutation rate (F):

  • F = 0.5: Conservative mutations

  • F = 1.0: More aggressive exploration

  • F > 1.0: Very aggressive (may overshoot)

Crossover rate (CR):

  • Low CR (0.1-0.3): More of target preserved, slower mixing

  • High CR (0.7-1.0): More of mutant used, faster adaptation

Example

import numpy as np
from gradient_free_optimizers import DifferentialEvolutionOptimizer

def rosenbrock(para):
    x, y = para["x"], para["y"]
    return -((1 - x)**2 + 100 * (y - x**2)**2)

search_space = {
    "x": np.linspace(-5, 5, 100),
    "y": np.linspace(-5, 5, 100),
}

opt = DifferentialEvolutionOptimizer(
    search_space,
    population=20,
    mutation_rate=0.8,
    crossover_rate=0.9,
)

opt.search(rosenbrock, n_iter=300)
print(f"Best: {opt.best_para}, Score: {opt.best_score}")

When to Use

Good for:

  • Continuous, non-linear optimization

  • Functions with complex interactions

  • When you want self-adaptive step sizes

  • Real-parameter optimization

Not ideal for:

  • Pure discrete/categorical problems

  • Very small populations (needs at least 4 for basic DE)

Comparison with Other Methods

Algorithm

Step Size

Mechanism

DE

Self-adaptive (difference vectors)

Mutation + crossover

ES

Mutation variance

Mutation + selection

PSO

Velocity

Attraction to bests

Higher-Dimensional Example

import numpy as np
from gradient_free_optimizers import DifferentialEvolutionOptimizer

def styblinski_tang_4d(para):
    vals = [para[f"x{i}"] for i in range(4)]
    return -sum(v**4 - 16 * v**2 + 5 * v for v in vals) / 2

search_space = {
    f"x{i}": np.linspace(-5, 5, 200)
    for i in range(4)
}

opt = DifferentialEvolutionOptimizer(
    search_space,
    population=25,
    mutation_rate=0.7,
    crossover_rate=0.9,
)

opt.search(styblinski_tang_4d, n_iter=500)
print(f"Best: {opt.best_para}")
print(f"Score: {opt.best_score}")

Trade-offs

  • Exploration vs. exploitation: Self-adaptive through difference vectors. mutation_rate (F) and crossover_rate (CR) provide additional control.

  • Computational overhead: Low per individual. Population must be at least 4 to form the required difference vectors.

  • Parameter sensitivity: The standard settings (F=0.8, CR=0.9) work surprisingly well across a wide range of problems. Population size matters more than the rates.