The Pareto Frontier
The Pareto frontier shows the complete set of optimal trade-offs between predictive performance and model complexity.
What Is Pareto Optimality?
A model configuration is Pareto optimal when you cannot improve one objective (accuracy or simplicity) without degrading the other.
In Avenue's Pareto frontier chart:
- X-axis: Model complexity (estimated number of factor tables after consolidation)
- Y-axis: Predictive performance (cross-validation score—higher is better)
Each point represents a complete GBM configuration with specific hyperparameters. Points on the frontier are all optimal—the "best" choice depends on your requirements.
Interpreting Frontier Patterns
Elbow: A sharp bend where small accuracy gains require large complexity increases. Often a practical choice.
Plateau: Performance flattens beyond a certain complexity—diminishing returns from additional tables.
Cliff: Accuracy drops sharply below minimum complexity, showing the essential complexity for your problem.
Selecting a Model
Left side (fewer tables):
- Simpler models, easier regulatory review
- Better for understanding key drivers
Right side (more tables):
- Maximum accuracy, more interactions captured
- Still fully transparent and explainable
Middle:
- Balanced accuracy and simplicity
The model selection guide provides specific recommendations by use case.
Complexity and Performance Metrics
Complexity: Median number of factor tables across cross-validation folds (after ANOVA-style consolidation)
Performance (depends on objective):
- Poisson/Gamma/Tweedie: Negative log-likelihood
- Binary: Log-loss or AUC
- Regression/Huber: RMSE or MAE