mars.learn.ensemble.BlockwiseVotingClassifier#

class mars.learn.ensemble.BlockwiseVotingClassifier(estimator: BaseEstimator, voting: str = 'hard', classes: Optional[Union[ndarray, list, Tensor]] = None)[source]#

Blockwise training and ensemble voting classifier.

This classifier trains on blocks / partitions of tensors or DataFrames. A cloned version of estimator will be fit independently on each block or partition of the data. This is useful when the sub estimator only works on small in-memory data structures like a NumPy array or pandas DataFrame.

Prediction is done by the ensemble of learned models.

Warning

Ensure that your data are sufficiently shuffled prior to training! If the values of the various blocks / partitions of your dataset are not distributed similarly, the classifier will give poor results.

Parameters
  • estimator (Estimator) –

  • voting (str, {'hard', 'soft'} (default='hard')) – If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.

  • classes (list-like, optional) – The set of classes that y can take. This can also be provided as a fit param if the underlying estimator requires classes at fit time.

estimators_#

The collection of fitted sub-estimators that are estimator fitted on each partition / block of the inputs.

Type

list of classifiers

classes_#

The class labels.

Type

array-like, shape (n_predictions,)

Examples

>>> import mars.tensor as mt
>>> from mars.learn.ensemble import BlockwiseVotingClassifier
>>> from sklearn.linear_model import RidgeClassifier
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=100_000)
>>> X, y = mt.tensor(X, chunk_size=10_0000), mt.tensor(y, chunk_size=10_0000)
>>> subestimator = RidgeClassifier(random_state=0)
>>> clf = BlockwiseVotingClassifier(subestimator)
>>> clf.fit(X, y)
__init__(estimator: BaseEstimator, voting: str = 'hard', classes: Optional[Union[ndarray, list, Tensor]] = None)[source]#

Methods

__init__(estimator[, voting, classes])

fit(X, y[, classes, session, run_kwargs])

get_params([deep])

Get parameters for this estimator.

predict(X[, session, run_kwargs])

predict_proba(X[, session, run_kwargs])

score(X, y[, sample_weight, session, run_kwargs])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.