mars.dataframe.groupby.GroupBy.transform#
- GroupBy.transform(f, *args, dtypes=None, dtype=None, name=None, index=None, output_types=None, skip_infer=False, **kwargs)#
Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values
- 参数
f (function) – Function to apply to each group.
dtypes (Series, default None) – Specify dtypes of returned DataFrames. See Notes for more details.
dtype (numpy.dtype, default None) – Specify dtype of returned Series. See Notes for more details.
name (str, default None) – Specify name of returned Series. See Notes for more details.
skip_infer (bool, default False) – Whether infer dtypes when dtypes or output_type is not specified.
*args – Positional arguments to pass to func
**kwargs – Keyword arguments to be passed into func.
- 返回类型
参见
DataFrame.groupby.apply,DataFrame.groupby.aggregate,DataFrame.transform备注
Each group is endowed the attribute ‘name’ in case you need to know which group you are working on.
The current implementation imposes three requirements on f:
f must return a value that either has the same shape as the input subframe or can be broadcast to the shape of the input subframe. For example, if f returns a scalar it will be broadcast to have the same shape as the input subframe.
if this is a DataFrame, f must support application column-by-column in the subframe. If f also supports application to the entire subframe, then a fast path is used starting from the second chunk.
f must not mutate groups. Mutation is not supported and may produce unexpected results.
备注
When deciding output dtypes and shape of the return value, Mars will try applying
funconto a mock grouped object, and the transform call may fail.For DataFrame output, you need to specify a list or a pandas Series as
dtypesof output DataFrame.indexof output can also be specified.For Series output, you need to specify
dtypeandnameof output Series.
示例
>>> import mars.dataframe as md >>> df = md.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', ... 'foo', 'bar'], ... 'B' : ['one', 'one', 'two', 'three', ... 'two', 'two'], ... 'C' : [1, 5, 5, 2, 5, 5], ... 'D' : [2.0, 5., 8., 1., 2., 9.]}) >>> grouped = df.groupby('A') >>> grouped.transform(lambda x: (x - x.mean()) / x.std()).execute() C D 0 -1.154701 -0.577350 1 0.577350 0.000000 2 0.577350 1.154701 3 -1.154701 -1.000000 4 0.577350 -0.577350 5 0.577350 1.000000
Broadcast result of the transformation
>>> grouped.transform(lambda x: x.max() - x.min()).execute() C D 0 4 6.0 1 3 8.0 2 4 6.0 3 3 8.0 4 4 6.0 5 3 8.0