mars.dataframe.Series.drop_duplicates#
- Series.drop_duplicates(keep='first', inplace=False, method='auto')#
Return Series with duplicate values removed.
- 参数
keep ({‘first’, ‘last’,
False}, default ‘first’) –Method to handle dropping duplicates:
’first’ : Drop duplicates except for the first occurrence.
’last’ : Drop duplicates except for the last occurrence.
False: Drop all duplicates.
inplace (bool, default
False) – IfTrue, performs operation inplace and returns None.
- 返回
Series with duplicates dropped.
- 返回类型
参见
Index.drop_duplicatesEquivalent method on Index.
DataFrame.drop_duplicatesEquivalent method on DataFrame.
Series.duplicatedRelated method on Series, indicating duplicate Series values.
示例
Generate a Series with duplicated entries.
>>> import mars.dataframe as md >>> s = md.Series(['lame', 'cow', 'lame', 'beetle', 'lame', 'hippo'], ... name='animal') >>> s.execute() 0 lame 1 cow 2 lame 3 beetle 4 lame 5 hippo Name: animal, dtype: object
With the ‘keep’ parameter, the selection behaviour of duplicated values can be changed. The value ‘first’ keeps the first occurrence for each set of duplicated entries. The default value of keep is ‘first’.
>>> s.drop_duplicates().execute() 0 lame 1 cow 3 beetle 5 hippo Name: animal, dtype: object
The value ‘last’ for parameter ‘keep’ keeps the last occurrence for each set of duplicated entries.
>>> s.drop_duplicates(keep='last').execute() 1 cow 3 beetle 4 lame 5 hippo Name: animal, dtype: object
The value
Falsefor parameter ‘keep’ discards all sets of duplicated entries. Setting the value of ‘inplace’ toTrueperforms the operation inplace and returnsNone.>>> s.drop_duplicates(keep=False, inplace=True) >>> s.execute() 1 cow 3 beetle 5 hippo Name: animal, dtype: object