mars.dataframe.Series.str.findall#

Series.str.findall(pat, flags=0)#

Find all occurrences of pattern or regular expression in the Series/Index.

Equivalent to applying re.findall() to all the elements in the Series/Index.

Parameters
  • pat (str) – Pattern or regular expression.

  • flags (int, default 0) – Flags from re module, e.g. re.IGNORECASE (default is 0, which means no flags).

Returns

All non-overlapping matches of pattern or regular expression in each string of this Series/Index.

Return type

Series/Index of lists of strings

See also

count

Count occurrences of pattern or regular expression in each string of the Series/Index.

extractall

For each string in the Series, extract groups from all matches of regular expression and return a DataFrame with one row for each match and one column for each group.

re.findall

The equivalent re function to all non-overlapping matches of pattern or regular expression in string, as a list of strings.

Examples

>>> import mars.dataframe as md
>>> s = md.Series(['Lion', 'Monkey', 'Rabbit'])

The search for the pattern ‘Monkey’ returns one match:

>>> s.str.findall('Monkey').execute()
0          []
1    [Monkey]
2          []
dtype: object

On the other hand, the search for the pattern ‘MONKEY’ doesn’t return any match:

>>> s.str.findall('MONKEY').execute()
0    []
1    []
2    []
dtype: object

Flags can be added to the pattern or regular expression. For instance, to find the pattern ‘MONKEY’ ignoring the case:

>>> import re
>>> s.str.findall('MONKEY', flags=re.IGNORECASE).execute()
0          []
1    [Monkey]
2          []
dtype: object

When the pattern matches more than one string in the Series, all matches are returned:

>>> s.str.findall('on').execute()
0    [on]
1    [on]
2      []
dtype: object

Regular expressions are supported too. For instance, the search for all the strings ending with the word ‘on’ is shown next:

>>> s.str.findall('on$').execute()
0    [on]
1      []
2      []
dtype: object

If the pattern is found more than once in the same string, then a list of multiple strings is returned:

>>> s.str.findall('b').execute()
0        []
1        []
2    [b, b]
dtype: object