Mars Learn#

This is the class and function reference of Mars learn.

Clustering#

Classes#

cluster.KMeans([n_clusters, init, n_init, ...])

K-Means clustering.

Functions#

cluster.k_means(X, n_clusters[, ...])

K-means clustering algorithm.

Datasets#

Samples generator#

datasets.make_blobs([n_samples, n_features, ...])

Generate isotropic Gaussian blobs for clustering.

datasets.make_classification([n_samples, ...])

Generate a random n-class classification problem.

datasets.make_low_rank_matrix([n_samples, ...])

Generate a mostly low rank matrix with bell-shaped singular values

datasets.make_regression([n_samples, ...])

Generate a random regression problem.

Matrix Decomposition#

decomposition.PCA([n_components, copy, ...])

Principal component analysis (PCA)

decomposition.TruncatedSVD([n_components, ...])

Dimensionality reduction using truncated SVD (aka LSA).

Ensemble Methods#

ensemble.BaggingClassifier([base_estimator, ...])

A Bagging classifier.

ensemble.BaggingRegressor([base_estimator, ...])

A Bagging regressor.

ensemble.BlockwiseVotingClassifier(estimator)

Blockwise training and ensemble voting classifier.

ensemble.BlockwiseVotingRegressor(estimator)

Blockwise training and ensemble voting regressor.

ensemble.IsolationForest(*[, n_estimators, ...])

Isolation Forest Algorithm.

Linear Models#

Classical linear regressors#

linear_model.LinearRegression(*[, ...])

Ordinary least squares Linear Regression.

Metrics#

Classification metrics#

metrics.accuracy_score(y_true, y_pred[, ...])

Accuracy classification score.

metrics.auc(x, y[, session, run_kwargs])

Compute Area Under the Curve (AUC) using the trapezoidal rule

metrics.f1_score(y_true, y_pred, *[, ...])

Compute the F1 score, also known as balanced F-score or F-measure

metrics.fbeta_score(y_true, y_pred, *, beta)

Compute the F-beta score

metrics.log_loss(y_true, y_pred, *[, eps, ...])

Log loss, aka logistic loss or cross-entropy loss.

metrics.multilabel_confusion_matrix(y_true, ...)

Compute a confusion matrix for each class or sample.

metrics.precision_score(y_true, y_pred, *[, ...])

Compute the precision

metrics.precision_recall_fscore_support(...)

Compute precision, recall, F-measure and support for each class

metrics.recall_score(y_true, y_pred, *[, ...])

Compute the recall

metrics.roc_auc_score(y_true, y_score, *[, ...])

Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

metrics.roc_curve(y_true, y_score[, ...])

Compute Receiver operating characteristic (ROC)

Regression metrics#

metrics.r2_score(y_true, y_pred, *[, ...])

\(R^2\) (coefficient of determination) regression score function.

Pairwise metrics#

metrics.pairwise.cosine_similarity(X[, Y, ...])

Compute cosine similarity between samples in X and Y.

metrics.pairwise.cosine_distances(X[, Y])

Compute cosine distance between samples in X and Y.

metrics.pairwise.euclidean_distances(X[, Y, ...])

Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors.

metrics.pairwise.haversine_distances(X[, Y])

Compute the Haversine distance between samples in X and Y

metrics.pairwise.manhattan_distances(X[, Y, ...])

Compute the L1 distances between the vectors in X and Y.

metrics.pairwise.rbf_kernel(X[, Y, gamma])

Compute the rbf (gaussian) kernel between X and Y.

metrics.pairwise_distances(X[, Y, metric])

Model Selection#

Splitter Classes#

model_selection.KFold([n_splits, shuffle, ...])

K-Folds cross-validator

Splitter Functions#

model_selection.train_test_split(*arrays, ...)

Split arrays or matrices into random train and test subsets

Nearest Neighbors#

neighbors.NearestNeighbors([n_neighbors, ...])

Preprocessing and Normalization#

preprocessing.LabelBinarizer(*[, neg_label, ...])

Binarize labels in a one-vs-all fashion.

preprocessing.LabelEncoder()

Encode target labels with value between 0 and n_classes-1.

preprocessing.MinMaxScaler([feature_range, ...])

Transform features by scaling each feature to a given range.

preprocessing.minmax_scale(X[, ...])

Transform features by scaling each feature to a given range.

preprocessing.label_binarize(y, *, classes)

Binarize labels in a one-vs-all fashion.

preprocessing.normalize(X[, norm, axis, ...])

Scale input vectors individually to unit norm (vector length).

Semi-Supervised Learning#

semi_supervised.LabelPropagation([kernel, ...])

Label Propagation classifier

Utilities#

utils.assert_all_finite(X[, allow_nan, ...])

utils.check_X_y(X, y[, accept_sparse, ...])

Input validation for standard estimators.

utils.check_array(array[, accept_sparse, ...])

Input validation on a tensor, list, sparse matrix or similar.

utils.check_consistent_length(*arrays[, ...])

Check that all arrays have consistent first dimensions.

utils.multiclass.type_of_target(y)

Determine the type of data indicated by the target.

utils.multiclass.is_multilabel(y)

Check if y is in a multilabel format.

utils.shuffle(*arrays, **options)

utils.validation.check_is_fitted(estimator)

Perform is_fitted validation for estimator.

utils.validation.column_or_1d(y[, warn])

Ravel column or 1d numpy array, else raises an error

Misc#

wrappers.ParallelPostFit([estimator, scoring])

Meta-estimator for parallel predict and transform.

LightGBM Integration#

contrib.lightgbm.LGBMClassifier(*args, **kwargs)

contrib.lightgbm.LGBMRegressor(*args, **kwargs)

contrib.lightgbm.LGBMRanker(*args, **kwargs)

PyTorch Integration#

contrib.pytorch.run_pytorch_script(script, ...)

Run PyTorch script in Mars cluster.

contrib.pytorch.MarsDataset

contrib.pytorch.SequentialSampler

contrib.pytorch.RandomSampler

contrib.pytorch.SubsetRandomSampler

contrib.pytorch.DistributedSampler

StatsModels Integration#

contrib.statsmodels.MarsDistributedModel([...])

contrib.statsmodels.MarsResults(model)

TensorFlow Integration#

contrib.tensorflow.run_tensorflow_script(...)

Run TensorFlow script in Mars cluster.

contrib.tensorflow.gen_tensorflow_dataset(tensors)

convert mars data type to tf.data.Dataset.

XGBoost Integration#

contrib.xgboost.MarsDMatrix(data[, label, ...])

contrib.xgboost.train(params, dtrain[, evals])

Train XGBoost model in Mars manner.

contrib.xgboost.predict(model, data[, ...])

contrib.xgboost.XGBClassifier([max_depth, ...])

Implementation of the scikit-learn API for XGBoost classification.

contrib.xgboost.XGBRegressor([max_depth, ...])

Implementation of the scikit-learn API for XGBoost regressor.