mars.dataframe.Series.str.rsplit¶
- Series.str.rsplit(pat=None, n=- 1, expand=False)¶
Split strings around given separator/delimiter.
Splits the string in the Series/Index from the end, at the specified delimiter string. Equivalent to
str.rsplit()
.- 参数
pat (str, optional) – String or regular expression to split on. If not specified, split on whitespace.
n (int, default -1 (all)) – Limit number of splits in output.
None
, 0 and -1 will be interpreted as return all splits.expand (bool, default False) –
Expand the split strings into separate columns.
If
True
, return DataFrame/MultiIndex expanding dimensionality.If
False
, return Series/Index, containing lists of strings.
- 返回
Type matches caller unless
expand=True
(see Notes).- 返回类型
参见
Series.str.split
Split strings around given separator/delimiter.
Series.str.rsplit
Splits string around given separator/delimiter, starting from the right.
Series.str.join
Join lists contained as elements in the Series/Index with passed delimiter.
str.split
Standard library version for split.
str.rsplit
Standard library version for rsplit.
提示
The handling of the n keyword depends on the number of found splits:
If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append None for padding up to n if
expand=True
If using
expand=True
, Series and Index callers return DataFrame and MultiIndex objects, respectively.实际案例
>>> import mars.tensor as mt >>> import mars.dataframe as md >>> s = md.Series( ... [ ... "this is a regular sentence", ... "https://docs.python.org/3/tutorial/index.html", ... mt.nan ... ] ... ) >>> s.execute() 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html 2 NaN dtype: object
In the default setting, the string is split by whitespace.
>>> s.str.split().execute() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object
Without the n parameter, the outputs of rsplit and split are identical.
>>> s.str.rsplit().execute() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object
The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.
>>> s.str.split(n=2).execute() 0 [this, is, a regular sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object
>>> s.str.rsplit(n=2).execute() 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 NaN dtype: object
The pat parameter can be used to split by other characters.
>>> s.str.split(pat="/").execute() 0 [this is a regular sentence] 1 [https:, , docs.python.org, 3, tutorial, index... 2 NaN dtype: object
When using
expand=True
, the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split.>>> s.str.split(expand=True).execute() 0 1 2 3 4 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html None None None None 2 NaN NaN NaN NaN NaN
For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.
>>> s.str.rsplit("/", n=1, expand=True).execute() 0 1 0 this is a regular sentence None 1 https://docs.python.org/3/tutorial index.html 2 NaN NaN
Remember to escape special characters when explicitly using regular expressions.
>>> s = md.Series(["1+1=2"]) >>> s.execute() 0 1+1=2 dtype: object >>> s.str.split(r"\+|=", expand=True).execute() 0 1 2 0 1 1 2