GBM (gradient boosting machine)
This is the base algorithm. We can change the predictor to tree and other kinds…
GBDT ( gradient boosting decision tree)
XGBoost tackles this inefficiency by looking at the distribution of features across all data points in a leaf and using this information to reduce the search space of possible feature splits.
XGBoost supports missing value by default. In tree algorithms, branch directions for missing values are learned during training. Note that the gblinear booster treats missing values as zeros.
DART booster: different way to prevent overfitting
pre-sorting splitting, which is improved by LightGBM
GOSS: Gradient-based One-Side Sampling
The basic assumption taken here is that samples with training instances with small gradients have smaller training error and it is already well-trained.
In order to keep the same data distribution, when computing the information gain, GOSS introduces a constant multiplier for the data instances with small gradients. Thus, GOSS achieves a good balance between reducing the number of data instances and keeping the accuracy for learned decision trees.
LightGBM can handle categorical features.
need to work on larger data set, at least 10,000 rows
- learning_rate: default 0.1, small learning_rate with large number_iterations for better accuracy
- max_depth: default 20, lower max_depth to avoid overfit
- num_leaves: default 31, smaller than 2^max_depth
- min_data_in_leaf: default 20, Its optimal value depends on the number of training samples and
num_leaves. larger value to avoid overfit
- lambda_l1, lambda_l2: default 0, regularization on weights
Prone to over-fitting: larger variety/complexity
Isolation Tree (anomaly detection)