mars.dataframe.Series.str.rsplit¶

Series.str.rsplit(pat=None, n=- 1, expand=False)¶

Split strings around given separator/delimiter.

Splits the string in the Series/Index from the end, at the specified delimiter string. Equivalent to str.rsplit().

参数

pat (str, optional) – String or regular expression to split on. If not specified, split on whitespace.
n (int, default -1 (all)) – Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits.
expand (bool, default False) –
Expand the split strings into separate columns.
- If True, return DataFrame/MultiIndex expanding dimensionality.
- If False, return Series/Index, containing lists of strings.

返回

Type matches caller unless expand=True (see Notes).

返回类型

Series, Index, DataFrame or MultiIndex

参见

Series.str.split: Split strings around given separator/delimiter.
Series.str.rsplit: Splits string around given separator/delimiter, starting from the right.
Series.str.join: Join lists contained as elements in the Series/Index with passed delimiter.
str.split: Standard library version for split.
str.rsplit: Standard library version for rsplit.

提示

The handling of the n keyword depends on the number of found splits:

If found splits > n, make first n splits only
If found splits <= n, make all splits
If for a certain row the number of found splits < n, append None for padding up to n if expand=True

If using expand=True, Series and Index callers return DataFrame and MultiIndex objects, respectively.

实际案例

>>> import mars.tensor as mt
>>> import mars.dataframe as md
>>> s = md.Series(
...     [
...         "this is a regular sentence",
...         "https://docs.python.org/3/tutorial/index.html",
...         mt.nan
...     ]
... )
>>> s.execute()
0                       this is a regular sentence
1    https://docs.python.org/3/tutorial/index.html
2                                              NaN
dtype: object

In the default setting, the string is split by whitespace.

>>> s.str.split().execute()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

Without the n parameter, the outputs of rsplit and split are identical.

>>> s.str.rsplit().execute()
0                   [this, is, a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different.

>>> s.str.split(n=2).execute()
0                     [this, is, a regular sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

>>> s.str.rsplit(n=2).execute()
0                     [this is a, regular, sentence]
1    [https://docs.python.org/3/tutorial/index.html]
2                                                NaN
dtype: object

The pat parameter can be used to split by other characters.

>>> s.str.split(pat="/").execute()
0                         [this is a regular sentence]
1    [https:, , docs.python.org, 3, tutorial, index...
2                                                  NaN
dtype: object

When using expand=True, the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split.

>>> s.str.split(expand=True).execute()
                                               0     1     2        3         4
0                                           this    is     a  regular  sentence
1  https://docs.python.org/3/tutorial/index.html  None  None     None      None
2                                            NaN   NaN   NaN      NaN       NaN

For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used.

>>> s.str.rsplit("/", n=1, expand=True).execute()
                                    0           1
0          this is a regular sentence        None
1  https://docs.python.org/3/tutorial  index.html
2                                 NaN         NaN

Remember to escape special characters when explicitly using regular expressions.

>>> s = md.Series(["1+1=2"])
>>> s.execute()
0    1+1=2
dtype: object
>>> s.str.split(r"\+|=", expand=True).execute()
     0    1    2
0    1    1    2

mars.dataframe.Series.str.split mars.dataframe.Series.str.startswith