mars.learn.ensemble.BaggingClassifier#

class mars.learn.ensemble.BaggingClassifier(base_estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, reducers=1.0)[source]#

A Bagging classifier.

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

This algorithm encompasses several works from the literature. When random subsets of the dataset are drawn as random subsets of the samples, then this algorithm is known as Pasting 1. If samples are drawn with replacement, then the method is known as Bagging 2. When random subsets of the dataset are drawn as random subsets of the features, then the method is known as Random Subspaces 3. Finally, when base estimators are built on subsets of both samples and features, then the method is known as Random Patches 4.

See also

BaggingRegressor: A Bagging regressor.

References

1: L. Breiman, “Pasting small votes for classification in large databases and on-line”, Machine Learning, 36(1), 85-103, 1999.
2: L. Breiman, “Bagging predictors”, Machine Learning, 24(2), 123-140, 1996.
3: T. Ho, “The random subspace method for constructing decision forests”, Pattern Analysis and Machine Intelligence, 20(8), 832-844, 1998.
4: G. Louppe and P. Geurts, “Ensembles on Random Patches”, Machine Learning and Knowledge Discovery in Databases, 346-361, 2012.

Examples

>>> from sklearn.svm import SVC
>>> from mars.learn.ensemble import BaggingClassifier
>>> from mars.learn.datasets import make_classification
>>> X, y = make_classification(n_samples=100, n_features=4,
...                            n_informative=2, n_redundant=0,
...                            random_state=0, shuffle=False)
>>> clf = BaggingClassifier(base_estimator=SVC(),
...                         n_estimators=10, random_state=0).fit(X, y)
>>> clf.predict([[0, 0, 0, 0]])
array([1])

__init__(base_estimator=None, n_estimators=10, *, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, reducers=1.0)#

Methods

`__init__`([base_estimator, n_estimators, ...])
`decision_function`(X[, session, run_kwargs])	Average of the decision functions of the base classifiers.
`fit`(X[, y, sample_weight, session, run_kwargs])	Build a Bagging ensemble of estimators from the training set (X, y).
`predict`(X[, session, run_kwargs])	Predict class for X.
`predict_log_proba`(X[, session, run_kwargs])	Predict class log-probabilities for X.
`predict_proba`(X[, session, run_kwargs])	Predict class probabilities for X.
`score`(X, y[, sample_weight])	Return the mean accuracy on the given test data and labels.