Chapter 2
Algorithmic Enhancements and Customization
Beyond out-of-the-box speed, LightGBM empowers expert practitioners to unlock new heights of modeling accuracy and efficiency. This chapter unveils the innovative enhancements and tunable mechanisms that push LightGBM far ahead of conventional boosting. Discover how to architect algorithms for extreme datasets, adapt your models with custom objectives, and deeply refine the mechanics driving gradient boosting's core engine.
2.1 Exclusive Feature Bundling (EFB)
High-dimensional sparse datasets pose significant challenges to machine learning systems, particularly in gradient boosting frameworks, where the computational cost and memory overhead often scale poorly with the number of features. Exclusive Feature Bundling (EFB) addresses this issue by identifying sets of mutually exclusive features-features that do not take nonzero values simultaneously-and combining them into single compact feature bundles. This approach effectively transforms the original feature space into a lower-dimensional representation without loss of information, achieving significant reductions in memory consumption and training time.
At the core of EFB is a combinatorial optimization problem: given a collection of sparse features, partition them into bundles such that no two features in the same bundle have overlapping nonzero entries across the dataset instances. Formally, let F = {f1,f2,.,fn} be the set of features, and consider the incidence matrix X ? Rm×n, where m is the number of samples. Each entry Xij is nonzero if feature fj is active in sample i, and zero otherwise. The goal is to find a partition