mars.dataframe.DataFrame.map_chunk#

DataFrame.map_chunk(func, args=(), kwargs=None, skip_infer=False, **kw)#

Apply function to each chunk.

参数
  • func (function) – Function to apply to each chunk.

  • args (tuple) – Positional arguments to pass to func in addition to the array/series.

  • kwargs (Dict) – Additional keyword arguments to pass as keywords arguments to func.

  • skip_infer (bool, default False) – Whether infer dtypes when dtypes or output_type is not specified.

返回

Result of applying func to each chunk of the DataFrame or Series.

返回类型

Series or DataFrame

参见

DataFrame.apply

Perform any type of operations.

示例

>>> import mars.dataframe as md
>>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df.execute()
   A  B
0  4  9
1  4  9
2  4  9

Output type including Series or DataFrame will be auto inferred.

>>> df.map_chunk(lambda c: c['A'] + c['B']).execute()
0    13
1    13
2    13
dtype: int64

You can specify output_type by yourself if auto infer failed.

>>> import pandas as pd
>>> import numpy as np
>>> df['c'] = ['s1', 's2', 's3']
>>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1)).execute()
Traceback (most recent call last):
TypeError: Cannot determine `output_type`, you have to specify it as `dataframe` or `series`...
>>> df.map_chunk(lambda c: pd.concat([c['A'], c['c'].str.slice(1).astype(int)], axis=1),
>>>              output_type='dataframe', dtypes=pd.Series([np.dtype(object), np.dtype(int)])).execute()
   A  c
0  4  1
1  4  2
2  4  3