mars.dataframe.Series.map_chunk#
- Series.map_chunk(func, args=(), kwargs=None, skip_infer=False, **kw)#
Apply function to each chunk.
- Parameters
func (function) – Function to apply to each chunk.
args (tuple) – Positional arguments to pass to func in addition to the array/series.
kwargs (Dict) – Additional keyword arguments to pass as keywords arguments to func.
skip_infer (bool, default False) – Whether infer dtypes when dtypes or output_type is not specified.
- Returns
Result of applying
func
to each chunk of the DataFrame or Series.- Return type
See also
DataFrame.apply
Perform any type of operations.
Examples
>>> import mars.dataframe as md >>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B']) >>> df.execute() A B 0 4 9 1 4 9 2 4 9
Output type including Series or DataFrame will be auto inferred.
>>> df.map_chunk(lambda c: c['A'] + c['B']).execute() 0 13 1 13 2 13 dtype: int64
You can specify
output_type
by yourself if auto infer failed.>>> import pandas as pd >>> import numpy as np >>> df['c'] = ['s1', 's2', 's3'] >>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1)).execute() Traceback (most recent call last): TypeError: Cannot determine `output_type`, you have to specify it as `dataframe` or `series`... >>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1), >>> output_type='dataframe', dtypes=pd.Series([np.dtype(object), np.dtype(int)])).execute() A c 0 4 1 1 4 2 2 4 3