Differential Evolution

Differential Evolution (DE) generates new candidate solutions by adding the weighted difference between two randomly selected population members to a third member. This produces a mutant vector whose step size and direction are derived from the current population distribution. The mutant is then mixed with the original target vector through a crossover operation, and the result replaces the target only if it achieves a better score. Because the difference vectors are computed from the population itself, the step sizes adapt automatically: they are large when the population is spread out and shrink as it converges.

DE on Sphere function — **Convex function**: Population quickly converges.

DE on Ackley function — **Multi-modal function**: Difference vectors help escape local optima.

The self-adaptive step size is the key property that distinguishes DE from the other population-based optimizers in this library. PSO requires explicit velocity weights, ES requires a tuned mutation rate, and GA relies on a fixed crossover operator. DE derives both step direction and magnitude from the population, which makes it less sensitive to parameter settings. The default configuration (F=0.8, CR=0.9) works across a broad range of continuous problems without adjustment. DE is the preferred choice for real-valued optimization in this library when the problem has moderate to high dimensionality. It requires a minimum population size of 4 to form the necessary difference vectors.

Algorithm

For each target vector in the population:

Mutation: Create mutant vector from three random population members
```
mutant = x_r1 + F * (x_r2 - x_r3)
```
where F is the mutation factor (mutation_rate)
Crossover: Mix mutant with target vector
```
trial[i] = mutant[i] if random() < CR else target[i]
```
where CR is the crossover rate
Selection: Keep trial if better than target

The Key Insight

The difference vector (x_r2 - x_r3) automatically adapts:

Early in search: population is spread out, large differences
Late in search: population converges, small differences

This provides self-adaptive step sizes without explicit parameters.

Parameters

Parameter	Type	Default	Description
`population`	int	10	Population size
`mutation_rate`	float	0.9	Mutation factor F (typically 0.5-1.0)
`crossover_rate`	float	0.9	Crossover probability CR (typically 0.5-1.0)

Tuning Tips

Mutation rate (F):

F = 0.5: Conservative mutations
F = 1.0: More aggressive exploration
F > 1.0: Very aggressive (may overshoot)

Crossover rate (CR):

Low CR (0.1-0.3): More of target preserved, slower mixing
High CR (0.7-1.0): More of mutant used, faster adaptation

Example

import numpy as np
from gradient_free_optimizers import DifferentialEvolutionOptimizer

def rosenbrock(para):
    x, y = para["x"], para["y"]
    return -((1 - x)**2 + 100 * (y - x**2)**2)

search_space = {
    "x": np.linspace(-5, 5, 100),
    "y": np.linspace(-5, 5, 100),
}

opt = DifferentialEvolutionOptimizer(
    search_space,
    population=20,
    mutation_rate=0.8,
    crossover_rate=0.9,
)

opt.search(rosenbrock, n_iter=300)
print(f"Best: {opt.best_para}, Score: {opt.best_score}")

When to Use

Good for:

Continuous, non-linear optimization
Functions with complex interactions
When you want self-adaptive step sizes
Real-parameter optimization

Not ideal for:

Pure discrete/categorical problems
Very small populations (needs at least 4 for basic DE)

Comparison with Other Methods

Algorithm	Step Size	Mechanism
DE	Self-adaptive (difference vectors)	Mutation + crossover
ES	Mutation variance	Mutation + selection
PSO	Velocity	Attraction to bests

Higher-Dimensional Example

import numpy as np
from gradient_free_optimizers import DifferentialEvolutionOptimizer

def styblinski_tang_4d(para):
    vals = [para[f"x{i}"] for i in range(4)]
    return -sum(v**4 - 16 * v**2 + 5 * v for v in vals) / 2

search_space = {
    f"x{i}": np.linspace(-5, 5, 200)
    for i in range(4)
}

opt = DifferentialEvolutionOptimizer(
    search_space,
    population=25,
    mutation_rate=0.7,
    crossover_rate=0.9,
)

opt.search(styblinski_tang_4d, n_iter=500)
print(f"Best: {opt.best_para}")
print(f"Score: {opt.best_score}")

Trade-offs

Exploration vs. exploitation: Self-adaptive through difference vectors. mutation_rate (F) and crossover_rate (CR) provide additional control.
Computational overhead: Low per individual. Population must be at least 4 to form the required difference vectors.
Parameter sensitivity: The standard settings (F=0.8, CR=0.9) work surprisingly well across a wide range of problems. Population size matters more than the rates.