mars.dataframe.Series.str.extract¶
- Series.str.extract(pat: str, flags: int = 0, expand: bool = True) FrameOrSeriesUnion | Index ¶
Extract capture groups in the regex pat as columns in a DataFrame.
For each subject string in the Series, extract groups from the first match of regular expression pat.
- 参数
pat (str) – Regular expression pattern with capturing groups.
flags (int, default 0 (no flags)) – Flags from the
re
module, e.g.re.IGNORECASE
, that modify regular expression matching for things like case, spaces, etc. For more details, seere
.expand (bool, default True) – If True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.
- 返回
A DataFrame with one row for each subject string, and one column for each group. Any capture group names in regular expression pat will be used for column names; otherwise capture group numbers will be used. The dtype of each result column is always object, even when no match is found. If
expand=False
and pat has only one capture group, then return a Series (if subject is a Series) or Index (if subject is an Index).- 返回类型
参见
extractall
Returns all matches (not just the first match).
实际案例
A pattern with two groups will return a DataFrame with two columns. Non-matches will be NaN.
>>> import mars.dataframe as md >>> s = md.Series(['a1', 'b2', 'c3']) >>> s.str.extract(r'([ab])(\d)').execute() 0 1 0 a 1 1 b 2 2 NaN NaN
A pattern may contain optional groups.
>>> s.str.extract(r'([ab])?(\d)').execute() 0 1 0 a 1 1 b 2 2 NaN 3
Named groups will become column names in the result.
>>> s.str.extract(r'(?P<letter>[ab])(?P<digit>\d)').execute() letter digit 0 a 1 1 b 2 2 NaN NaN
A pattern with one group will return a DataFrame with one column if expand=True.
>>> s.str.extract(r'[ab](\d)', expand=True).execute() 0 0 1 1 2 2 NaN
A pattern with one group will return a Series if expand=False.
>>> s.str.extract(r'[ab](\d)', expand=False).execute() 0 1 1 2 2 NaN dtype: object