data science · machine learning

Trees in Machine Learning

GBM (gradient boosting machine)

This is the base algorithm. We can change the predictor to tree and other kinds…

GBDT ( gradient boosting decision tree)


XGBoost tackles this inefficiency by looking at the distribution of features across all data points in a leaf and using this information to reduce the search space of possible feature splits.


XGBoost supports missing value by default. In tree algorithms, branch directions for missing values are learned during training. Note that the gblinear booster treats missing values as zeros.

DART booster: different way to prevent overfitting

pre-sorting splitting, which is improved by LightGBM


GOSS: Gradient-based One-Side Sampling

The basic assumption taken here is that samples with training instances with small gradients have smaller training error and it is already well-trained.
In order to keep the same data distribution, when computing the information gain, GOSS introduces a constant multiplier for the data instances with small gradients. Thus, GOSS achieves a good balance between reducing the number of data instances and keeping the accuracy for learned decision trees.

LightGBM can handle categorical features.

need to work on larger data set, at least 10,000 rows

LightGBM important parameters:

  • learning_rate: default 0.1, small learning_rate with large number_iterations for better accuracy
  • max_depth: default 20, lower max_depth to avoid overfit
  • num_leaves: default 31, smaller than 2^max_depth
  • min_data_in_leaf: default 20, Its optimal value depends on the number of training samples and num_leaves. larger value to avoid overfit
  • lambda_l1, lambda_l2: default 0, regularization on weights


Prone to over-fitting: larger variety/complexity

Isolation Tree (anomaly detection)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s