mars.learn.contrib.lightgbm.LGBMClassifier

class mars.learn.contrib.lightgbm.LGBMClassifier(*args, **kwargs)[source]
__init__(*args, **kwargs)

Construct a gradient boosting model.

Parameters
  • boosting_type (str, optional (default='gbdt')) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.

  • num_leaves (int, optional (default=31)) – Maximum tree leaves for base learners.

  • max_depth (int, optional (default=-1)) – Maximum tree depth for base learners, <=0 means no limit.

  • learning_rate (float, optional (default=0.1)) – Boosting learning rate. You can use callbacks parameter of fit method to shrink/adapt learning rate in training using reset_parameter callback. Note, that this will ignore the learning_rate argument in training.

  • n_estimators (int, optional (default=100)) – Number of boosted trees to fit.

  • subsample_for_bin (int, optional (default=200000)) – Number of samples for constructing bins.

  • objective (str, callable or None, optional (default=None)) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.

  • class_weight (dict, 'balanced' or None, optional (default=None)) – Weights associated with classes in the form {class_label: weight}. Use this parameter only for multi-class classification task; for binary classification task you may use is_unbalance or scale_pos_weight parameters. Note, that the usage of all these parameters will result in poor estimates of the individual class probabilities. You may want to consider performing probability calibration (https://scikit-learn.org/stable/modules/calibration.html) of your model. The ‘balanced’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). If None, all classes are supposed to have weight one. Note, that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

  • min_split_gain (float, optional (default=0.)) – Minimum loss reduction required to make a further partition on a leaf node of the tree.

  • min_child_weight (float, optional (default=1e-3)) – Minimum sum of instance weight (hessian) needed in a child (leaf).

  • min_child_samples (int, optional (default=20)) – Minimum number of data needed in a child (leaf).

  • subsample (float, optional (default=1.)) – Subsample ratio of the training instance.

  • subsample_freq (int, optional (default=0)) – Frequency of subsample, <=0 means no enable.

  • colsample_bytree (float, optional (default=1.)) – Subsample ratio of columns when constructing each tree.

  • reg_alpha (float, optional (default=0.)) – L1 regularization term on weights.

  • reg_lambda (float, optional (default=0.)) – L2 regularization term on weights.

  • random_state (int, RandomState object or None, optional (default=None)) – Random number seed. If int, this number is used to seed the C++ code. If RandomState object (numpy), a random integer is picked based on its state to seed the C++ code. If None, default seeds in C++ code are used.

  • n_jobs (int, optional (default=-1)) – Number of parallel threads.

  • silent (bool, optional (default=True)) – Whether to print messages while running boosting.

  • importance_type (str, optional (default='split')) – The type of feature importance to be filled into feature_importances_. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature.

  • **kwargs

    Other parameters for the model. Check http://lightgbm.readthedocs.io/en/latest/Parameters.html for more parameters.

    Warning

    **kwargs is not supported in sklearn, it may cause unexpected issues.

Note

A custom objective function can be provided for the objective parameter. In this case, it should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:

y_truearray-like of shape = [n_samples]

The target values.

y_predarray-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)

The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.

grouparray-like

Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.

gradarray-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)

The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.

hessarray-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)

The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.

For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well.

Methods

__init__(*args, **kwargs)

Construct a gradient boosting model.

fit(X, y[, sample_weight, init_score, ...])

Build a gradient boosting model from the training set (X, y).

get_params([deep])

Get parameters for this estimator.

load_model(model)

predict(X, **kwargs)

Return the predicted value for each sample.

predict_proba(X, **kwargs)

Return the predicted probability for each class for each sample.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.

to_local()

Attributes

best_iteration_

The best iteration of fitted model if early_stopping() callback has been specified.

best_score_

The best score of fitted model.

booster_

The underlying Booster of this model.

classes_

The class label array.

evals_result_

The evaluation results if validation sets have been specified.

feature_importances_

The feature importances (the higher, the more important).

feature_name_

The names of features.

n_classes_

The number of classes.

n_features_

The number of features of fitted model.

n_features_in_

The number of features of fitted model.

objective_

The concrete objective used while fitting this model.