mars.dataframe.DataFrame.map_chunk#
- DataFrame.map_chunk(func, args=(), **kwargs)#
Apply function to each chunk.
- 参数
func (function) – Function to apply to each chunk.
args (tuple) – Positional arguments to pass to func in addition to the array/series.
**kwargs – Additional keyword arguments to pass as keywords arguments to func.
- 返回
Result of applying
func
to each chunk of the DataFrame or Series.- 返回类型
参见
DataFrame.apply
Perform any type of operations.
实际案例
>>> import mars.dataframe as md >>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B']) >>> df.execute() A B 0 4 9 1 4 9 2 4 9
Output type including Series or DataFrame will be auto inferred.
>>> df.map_chunk(lambda c: c['A'] + c['B']).execute() 0 13 1 13 2 13 dtype: int64
You can specify
output_type
by yourself if auto infer failed.>>> import pandas as pd >>> import numpy as np >>> df['c'] = ['s1', 's2', 's3'] >>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1)).execute() Traceback (most recent call last): TypeError: Cannot determine `output_type`, you have to specify it as `dataframe` or `series`... >>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1), >>> output_type='dataframe', dtypes=pd.Series([np.dtype(object), np.dtype(int)])).execute() A c 0 4 1 1 4 2 2 4 3