mars.learn.contrib.lightgbm.LGBMClassifier#

class mars.learn.contrib.lightgbm.LGBMClassifier(*args, **kwargs)[source]#

__init__(*args, **kwargs)#

Construct a gradient boosting model.

Parameters

boosting_type (str, optional (default='gbdt')) – ‘gbdt’, traditional Gradient Boosting Decision Tree. ‘dart’, Dropouts meet Multiple Additive Regression Trees. ‘goss’, Gradient-based One-Side Sampling. ‘rf’, Random Forest.
num_leaves (int, optional (default=31)) – Maximum tree leaves for base learners.
max_depth (int, optional (default=-1)) – Maximum tree depth for base learners, <=0 means no limit.
learning_rate (float, optional (default=0.1)) – Boosting learning rate. You can use callbacks parameter of fit method to shrink/adapt learning rate in training using reset_parameter callback. Note, that this will ignore the learning_rate argument in training.
n_estimators (int, optional (default=100)) – Number of boosted trees to fit.
subsample_for_bin (int, optional (default=200000)) – Number of samples for constructing bins.
objective (str, callable or None, optional (default=None)) – Specify the learning task and the corresponding learning objective or a custom objective function to be used (see note below). Default: ‘regression’ for LGBMRegressor, ‘binary’ or ‘multiclass’ for LGBMClassifier, ‘lambdarank’ for LGBMRanker.
class_weight (dict, 'balanced' or None, optional (default=None)) – Weights associated with classes in the form {class_label: weight}. Use this parameter only for multi-class classification task; for binary classification task you may use is_unbalance or scale_pos_weight parameters. Note, that the usage of all these parameters will result in poor estimates of the individual class probabilities. You may want to consider performing probability calibration (https://scikit-learn.org/stable/modules/calibration.html) of your model. The ‘balanced’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). If None, all classes are supposed to have weight one. Note, that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
min_split_gain (float, optional (default=0.)) – Minimum loss reduction required to make a further partition on a leaf node of the tree.
min_child_weight (float, optional (default=1e-3)) – Minimum sum of instance weight (hessian) needed in a child (leaf).
min_child_samples (int, optional (default=20)) – Minimum number of data needed in a child (leaf).
subsample (float, optional (default=1.)) – Subsample ratio of the training instance.
subsample_freq (int, optional (default=0)) – Frequency of subsample, <=0 means no enable.
colsample_bytree (float, optional (default=1.)) – Subsample ratio of columns when constructing each tree.
reg_alpha (float, optional (default=0.)) – L1 regularization term on weights.
reg_lambda (float, optional (default=0.)) – L2 regularization term on weights.
random_state (int, RandomState object or None, optional (default=None)) – Random number seed. If int, this number is used to seed the C++ code. If RandomState object (numpy), a random integer is picked based on its state to seed the C++ code. If None, default seeds in C++ code are used.
n_jobs (int, optional (default=-1)) – Number of parallel threads.
silent (bool, optional (default=True)) – Whether to print messages while running boosting.
importance_type (str, optional (default='split')) – The type of feature importance to be filled into feature_importances_. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature.
**kwargs –
Other parameters for the model. Check http://lightgbm.readthedocs.io/en/latest/Parameters.html for more parameters.

Warning

**kwargs is not supported in sklearn, it may cause unexpected issues.

Note

A custom objective function can be provided for the objective parameter. In this case, it should have the signature objective(y_true, y_pred) -> grad, hess or objective(y_true, y_pred, group) -> grad, hess:

y_truearray-like of shape = [n_samples]
The target values.

y_predarray-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.

grouparray-like
Group/query data. Only used in the learning-to-rank task. sum(group) = n_samples. For example, if you have a 100-document dataset with group = [10, 20, 40, 10, 10, 10], that means that you have 6 groups, where the first 10 records are in the first group, records 11-30 are in the second group, records 31-70 are in the third group, etc.

gradarray-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of y_pred for each sample point.

hessarray-like of shape = [n_samples] or shape = [n_samples * n_classes] (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of y_pred for each sample point.

For multi-class task, the y_pred is group by class_id first, then group by row_id. If you want to get i-th row y_pred in j-th class, the access way is y_pred[j * num_data + i] and you should group grad and hess in this way as well.

Methods

`__init__`(args, *kwargs)	Construct a gradient boosting model.
`fit`(X, y[, sample_weight, init_score, ...])	Build a gradient boosting model from the training set (X, y).
`get_params`([deep])	Get parameters for this estimator.
`load_model`(model)
`predict`(X, **kwargs)	Return the predicted value for each sample.
`predict_proba`(X, **kwargs)	Return the predicted probability for each class for each sample.
`score`(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.
`set_params`(**params)	Set the parameters of this estimator.
`to_local`()

Attributes

`best_iteration_`	The best iteration of fitted model if `early_stopping()` callback has been specified.
`best_score_`	The best score of fitted model.
`booster_`	The underlying Booster of this model.
`classes_`	The class label array.
`evals_result_`	The evaluation results if validation sets have been specified.
`feature_importances_`	The feature importances (the higher, the more important).
`feature_name_`	The names of features.
`n_classes_`	The number of classes.
`n_features_`	The number of features of fitted model.
`n_features_in_`	The number of features of fitted model.
`objective_`	The concrete objective used while fitting this model.