mars.dataframe.DataFrame.map_chunk

DataFrame.map_chunk(func, args=(), **kwargs)

Apply function to each chunk.

参数
  • func (function) – Function to apply to each chunk.

  • args (tuple) – Positional arguments to pass to func in addition to the array/series.

  • **kwargs – Additional keyword arguments to pass as keywords arguments to func.

返回

Result of applying func to each chunk of the DataFrame or Series.

返回类型

Series or DataFrame

参见

DataFrame.apply

Perform any type of operations.

实际案例

>>> import mars.dataframe as md
>>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df.execute()
   A  B
0  4  9
1  4  9
2  4  9

Output type including Series or DataFrame will be auto inferred.

>>> df.map_chunk(lambda c: c['A'] + c['B']).execute()
0    13
1    13
2    13
dtype: int64

You can specify output_type by yourself if auto infer failed.

>>> import pandas as pd
>>> import numpy as np
>>> df['c'] = ['s1', 's2', 's3']
>>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1)).execute()
Traceback (most recent call last):
TypeError: Cannot determine `output_type`, you have to specify it as `dataframe` or `series`...
>>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1),
>>>              output_type='dataframe', dtypes=pd.Series([np.dtype(object), np.dtype(int)])).execute()
   A  c
0  4  1
1  4  2
2  4  3