问题
In [1]: import re
...: import pandas as pd
...: data = {
...: 'Dave':
...: 'Steve':
...: 'Rob':
...: 'Wes': np.nan
...: }
...: data = pd.Series(data)
...: pattern = '([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\\.([A-Z]{2,4})'
...: matches = data.str.match(pattern, flags=re.IGNORECASE)
In [2]: matches
Out[2]:
Dave True
Steve True
Rob True
Wes NaN
dtype: object
In [3]: matches.str.get(1)
Out[3]:
Dave NaN
Steve NaN
Rob NaN
Wes NaN
dtype: float64
In [4]: matches.str[0]
Out[4]:
Dave NaN
Steve NaN
Rob NaN
Wes NaN
dtype: float64
看到为什么matches已经匹配到数据,但是Out[3]
和 Out[4]
获取数据的时候还是显示NaN呢?
我的解题思路
- 首先,我做了一件事,先明白当前是什么对象,我们看到
matches
是一个pandas.core.series.Series
对象
In [5]: type(matches)
Out[5]: pandas.core.series.Series
In [6]: s = pd.Series([
...: "String",
...: (1, 2, 3),
...: ["a", "b", "c"],
...: 123,
...: -456,
...: { 1: "Hello", "2": "World" },
...: True,
...: False
...: ])
In [7]: type(s) # 对比前面的In [5],matches和现在的s是一样的类型
Out[7]: pandas.core.series.Series
In [8]: s.str.get(1)
Out[8]:
0 t # "String",
1 2 # (1, 2, 3),
2 b # ["a", "b", "c"],
3 NaN # 123, 不可迭代
4 NaN # -456, 不可迭代
5 Hello # { 1: "Hello", "2": "World" },
6 NaN # True, 不可迭代
7 NaN # False, 不可迭代
dtype: object
其实并非所有对象 在 pandas.Series.str.get
方法调用下都能获取,必须是可迭代的对象才能获取
结论
pandas.core.series.Series
对象 如果想调用 pandas.Series.str.get
方法获取数据,Series
的数据必须是可迭代的(字符串,元组,列表,字典)