mars.dataframe.DataFrame.apply#

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), dtypes=None, dtype=None, name=None, output_type=None, index=None, elementwise=None, skip_infer=False, **kwds)#

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

Parameters
  • func (function) – Function to apply to each column or row.

  • axis ({0 or 'index', 1 or 'columns'}, default 0) –

    Axis along which the function is applied:

    • 0 or ‘index’: apply function to each column.

    • 1 or ‘columns’: apply function to each row.

  • raw (bool, default False) –

    Determines if row or column is passed as a Series or ndarray object:

    • False : passes each row or column as a Series to the function.

    • True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

  • result_type ({'expand', 'reduce', 'broadcast', None}, default None) –

    These only act when axis=1 (columns):

    • ’expand’ : list-like results will be turned into columns.

    • ’reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.

    • ’broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.

    The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

  • output_type ({'dataframe', 'series'}, default None) – Specify type of returned object. See Notes for more details.

  • dtypes (Series, default None) – Specify dtypes of returned DataFrames. See Notes for more details.

  • dtype (numpy.dtype, default None) – Specify dtype of returned Series. See Notes for more details.

  • name (str, default None) – Specify name of returned Series. See Notes for more details.

  • index (Index, default None) – Specify index of returned object. See Notes for more details.

  • elementwise (bool, default False) –

    Specify whether func is an elementwise function:

    • False : The function is not elementwise. Mars will try concatenating chunks in rows (when axis=0) or in columns (when axis=1) and then apply func onto the concatenated chunk. The concatenation step can cause extra latency.

    • True : The function is elementwise. Mars will apply func to original chunks. This will not introduce extra concatenation step and reduce overhead.

  • skip_infer (bool, default False) – Whether infer dtypes when dtypes or output_type is not specified.

  • args (tuple) – Positional arguments to pass to func in addition to the array/series.

  • **kwds – Additional keyword arguments to pass as keywords arguments to func.

Returns

Result of applying func along the given axis of the DataFrame.

Return type

Series or DataFrame

See also

DataFrame.applymap

For elementwise operations.

DataFrame.aggregate

Only perform aggregating type operations.

DataFrame.transform

Only perform transforming type operations.

Notes

When deciding output dtypes and shape of the return value, Mars will try applying func onto a mock DataFrame, and the apply call may fail. When this happens, you need to specify the type of apply call (DataFrame or Series) in output_type.

  • For DataFrame output, you need to specify a list or a pandas Series as dtypes of output DataFrame. index of output can also be specified.

  • For Series output, you need to specify dtype and name of output Series.

Examples

>>> import numpy as np
>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> df = md.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
>>> df.execute()
   A  B
0  4  9
1  4  9
2  4  9

Using a reducing function on either axis

>>> df.apply(np.sum, axis=0).execute()
A    12
B    27
dtype: int64
>>> df.apply(np.sum, axis=1).execute()
0    13
1    13
2    13
dtype: int64

Returning a list-like will result in a Series

>>> df.apply(lambda x: [1, 2], axis=1).execute()
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object

Passing result_type='expand' will expand list-like results to columns of a Dataframe

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand').execute()
   0  1
0  1  2
1  1  2
2  1  2

Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

>>> df.apply(lambda x: md.Series([1, 2], index=['foo', 'bar']), axis=1).execute()
   foo  bar
0    1    2
1    1    2
2    1    2

Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast').execute()
   A  B
0  1  2
1  1  2
2  1  2