Forest Optimizer

The Forest Optimizer uses a tree-based ensemble (Extra Trees, Random Forest, or Gradient Boosting) as its surrogate model. The ensemble is trained on all observed (position, score) pairs, and each candidate position is scored by computing Expected Improvement from the ensemble’s mean prediction and variance. The variance is derived from disagreement among individual trees: regions where tree predictions diverge indicate high uncertainty, which drives exploration of under-sampled areas.

Forest Optimizer on Ackley function — **Multi-modal function**: Handles complex landscapes.

With O(n log n) surrogate training cost, the Forest Optimizer scales to evaluation budgets that are impractical for Bayesian Optimization (which is O(n^3)). Tree-based splits handle discrete and categorical parameters natively, without encoding. Unlike Tree-structured Parzen Estimators (TPE), which models each parameter dimension independently, the Forest Optimizer captures inter-parameter interactions through its tree structure, making it a strong choice for search spaces where parameter dependencies matter.

Algorithm

Similar to Bayesian Optimization, but using tree ensembles:

Fit surrogate: Train Random Forest/Extra Trees on observations
Predict: Get predictions from all trees in the ensemble
Uncertainty: Variance across tree predictions
Acquisition: Expected Improvement using mean and variance
Select and evaluate: Choose best acquisition, run objective

Note

Tree ensembles provide a practical “free” uncertainty estimate: the variance across individual tree predictions. Each tree sees a different bootstrap sample, so regions with consistent data produce agreement (low variance), while sparse or conflicting regions produce disagreement (high variance). This is computationally cheaper than GP uncertainty and degrades more gracefully in high dimensions.

Why Trees?

Tree-based models offer several advantages:

Scalability: O(n log n) training vs. O(n^3) for GP
Discrete parameters: Handle naturally (no encoding needed)
Non-stationarity: Adapt to different regions of space
Robust: Less sensitive to hyperparameter choices

Parameters

Parameter	Type	Default	Description
`tree_regressor`	str	“extra_tree”	Model type: “extra_tree”, “random_forest”, or “gradient_boost”
`tree_para`	dict	{“n_estimators”: 100}	Parameters for the tree model
`xi`	float	0.03	Exploration-exploitation trade-off

Tree Regressor Options

extra_tree: Extra Trees (randomized splits, faster)
random_forest: Random Forest (classic, robust)
gradient_boost: Gradient Boosting (sequential, powerful)

# Extra Trees (default, fast)
opt = ForestOptimizer(search_space, tree_regressor="extra_tree")

# Random Forest (more robust)
opt = ForestOptimizer(
    search_space,
    tree_regressor="random_forest",
    tree_para={"n_estimators": 200}
)

Example

import numpy as np
from gradient_free_optimizers import ForestOptimizer
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_wine

X, y = load_wine(return_X_y=True)

def objective(para):
    clf = SVC(
        C=para["C"],
        kernel=para["kernel"],
        gamma=para["gamma"],
    )
    return cross_val_score(clf, X, y, cv=3).mean()

search_space = {
    "C": np.logspace(-2, 2, 50),
    "kernel": np.array(["linear", "rbf", "poly"]),
    "gamma": np.logspace(-3, 0, 30),
}

opt = ForestOptimizer(
    search_space,
    tree_regressor="extra_tree",
    tree_para={"n_estimators": 100},
)

opt.search(objective, n_iter=50)
print(f"Best accuracy: {opt.best_score:.4f}")
print(f"Best params: {opt.best_para}")

When to Use

Good for:

Large search spaces with many parameters
Mixed discrete and continuous parameters
When you need many evaluations (100+)
Parallelizable (tree training is fast)

Compared to other SMBO:

vs. Bayesian: Better scaling, worse uncertainty
vs. TPE: Similar performance, different mechanism

Higher-Dimensional Example

import numpy as np
from gradient_free_optimizers import ForestOptimizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_digits

X, y = load_digits(return_X_y=True)

def objective(para):
    clf = RandomForestClassifier(
        n_estimators=para["n_estimators"],
        max_depth=para["max_depth"],
        min_samples_split=para["min_samples_split"],
        min_samples_leaf=para["min_samples_leaf"],
        max_features=para["max_features"],
        random_state=42,
    )
    return cross_val_score(clf, X, y, cv=3).mean()

search_space = {
    "n_estimators": np.arange(50, 300, 10),
    "max_depth": np.arange(3, 20),
    "min_samples_split": np.arange(2, 20),
    "min_samples_leaf": np.arange(1, 10),
    "max_features": np.array(["sqrt", "log2"]),
}

opt = ForestOptimizer(
    search_space,
    tree_regressor="extra_tree",
    tree_para={"n_estimators": 150},
    xi=0.02,
)
opt.search(objective, n_iter=60)

print(f"Best accuracy: {opt.best_score:.4f}")
print(f"Best params: {opt.best_para}")

Trade-offs

Exploration vs. exploitation: xi controls the trade-off as with other SMBO methods. The tree ensemble’s variance estimate tends to be noisier than a GP’s, which can provide natural exploration through prediction disagreement.
Computational overhead: O(n log n) training per iteration, much better than GP’s O(n^3). The tree_para configuration (especially n_estimators) directly affects this overhead.
Parameter sensitivity: The choice between extra_tree, random_forest, and gradient_boost matters more than the tree hyperparameters. Extra Trees is the fastest; Gradient Boosting is the most powerful but sequential.