mars.learn.ensemble.BaggingClassifier#

class mars.learn.ensemble.BaggingClassifier(base_estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, reducers=1.0)[source]#

A Bagging classifier.

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting 1. If samples are drawn with replacement, then the method is known as Bagging 2. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces 3. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches 4.

Read more in the User Guide.

Parameters
  • base_estimator (object, default=None) – The base estimator to fit on random subsets of the dataset. If None, then the base estimator is a DecisionTreeClassifier.

  • n_estimators (int, default=10) – The number of base estimators in the ensemble.

  • max_samples (int or float, default=1.0) –

    The number of samples to draw from X to train each base estimator (with replacement by default, see bootstrap for more details).

    • If int, then draw max_samples samples.

    • If float, then draw max_samples * X.shape[0] samples.

  • max_features (int or float, default=1.0) –

    The number of features to draw from X to train each base estimator ( without replacement by default, see bootstrap_features for more details).

    • If int, then draw max_features features.

    • If float, then draw max_features * X.shape[1] features.

  • bootstrap (bool, default=True) – Whether samples are drawn with replacement. If False, sampling without replacement is performed.

  • bootstrap_features (bool, default=False) – Whether features are drawn with replacement.

  • warm_start (bool, default=False) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble. See the Glossary.

  • random_state (int, RandomState instance or None, default=None) – Controls the random resampling of the original dataset (sample wise and feature wise). If the base estimator accepts a random_state attribute, a different seed is generated for each instance in the ensemble. Pass an int for reproducible output across multiple function calls. See Glossary.

base_estimator_#

The base estimator from which the ensemble is grown.

Type

estimator

estimators_#

The collection of fitted base estimators.

Type

list of estimators

estimators_features_#

The subset of drawn features for each base estimator.

Type

list of arrays

classes_#

The classes labels.

Type

ndarray of shape (n_classes,)

n_classes_#

The number of classes.

Type

int or list

See also

BaggingRegressor

A Bagging regressor.

References

1

L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.

2

L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.

3

T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.

4

G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

Examples

>>> from sklearn.svm import SVC
>>> from mars.learn.ensemble import BaggingClassifier
>>> from mars.learn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = BaggingClassifier(base_estimator=SVC(),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> clf.predict([[0, 0, 0, 0]])
array([1])
__init__(base_estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, reducers=1.0)#

Methods

__init__([base_estimator, n_estimators, ...])

decision_function(X[, session, run_kwargs])

Average of the decision functions of the base classifiers.

fit(X[, y, sample_weight, session, run_kwargs])

Build a Bagging ensemble of estimators from the training set (X, y).

predict(X[, session, run_kwargs])

Predict class for X.

predict_log_proba(X[, session, run_kwargs])

Predict class log-probabilities for X.

predict_proba(X[, session, run_kwargs])

Predict class probabilities for X.

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.