GroupBy.
transform
Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values
f (function) – Function to apply to each group.
dtypes (Series, default None) – Specify dtypes of returned DataFrames. See Notes for more details.
dtype (numpy.dtype, default None) – Specify dtype of returned Series. See Notes for more details.
name (str, default None) – Specify name of returned Series. See Notes for more details.
*args – Positional arguments to pass to func
**kwargs – Keyword arguments to be passed into func.
DataFrame
See also
DataFrame.groupby.apply, DataFrame.groupby.aggregate, DataFrame.transform
DataFrame.groupby.apply
DataFrame.groupby.aggregate
DataFrame.transform
Notes
Each group is endowed the attribute ‘name’ in case you need to know which group you are working on.
The current implementation imposes three requirements on f:
f must return a value that either has the same shape as the input subframe or can be broadcast to the shape of the input subframe. For example, if f returns a scalar it will be broadcast to have the same shape as the input subframe.
if this is a DataFrame, f must support application column-by-column in the subframe. If f also supports application to the entire subframe, then a fast path is used starting from the second chunk.
f must not mutate groups. Mutation is not supported and may produce unexpected results.
When deciding output dtypes and shape of the return value, Mars will try applying func onto a mock grouped object, and the transform call may fail.
func
For DataFrame output, you need to specify a list or a pandas Series as dtypes of output DataFrame. index of output can also be specified.
dtypes
index
For Series output, you need to specify dtype and name of output Series.
dtype
name
Examples
>>> import mars.dataframe as md >>> df = md.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', ... 'foo', 'bar'], ... 'B' : ['one', 'one', 'two', 'three', ... 'two', 'two'], ... 'C' : [1, 5, 5, 2, 5, 5], ... 'D' : [2.0, 5., 8., 1., 2., 9.]}) >>> grouped = df.groupby('A') >>> grouped.transform(lambda x: (x - x.mean()) / x.std()).execute() C D 0 -1.154701 -0.577350 1 0.577350 0.000000 2 0.577350 1.154701 3 -1.154701 -1.000000 4 0.577350 -0.577350 5 0.577350 1.000000
Broadcast result of the transformation
>>> grouped.transform(lambda x: x.max() - x.min()).execute() C D 0 4 6.0 1 3 8.0 2 4 6.0 3 3 8.0 4 4 6.0 5 3 8.0