可调用函数,函数的输入只有一个参数,代表当前 Series 或 DataFrame,函数的返回值需要是前面四种之一。
Series
1 2 3 4 5 6 7 8 9 10 11
In [1]: s = pd.Series(np.random.randn(6), index=list('abcdef'))
In [1]: print(s) Out[1]: a -1.008989 b 0.480573 c -0.806321 d -1.471417 e 1.691925 f -1.873039 dtype: float64
1 2
In [1]: s.loc['b'] Out[1]: 0.4805730548396703
1 2 3 4 5
In [1]: s.loc[['d', 'a']] Out[1]: d -1.471417 a -1.008989 dtype: float64
1 2 3 4 5 6 7
In [1]: s.loc['c':] Out[1]: c -0.806321 d -1.471417 e 1.691925 f -1.873039 dtype: float64
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In [1]: s > 0 Out[1]: a False b True c False d False e True f False dtype: bool
In [1]: s[s > 0] Out[1]: b 0.480573 e 1.691925 dtype: float64
DataFrame
df.loc[row_indexer, column_indexer]
df.loc[indexer] 表示对行进行索引
df.loc[:, indexer] 表示对列进行索引
df.loc[row_indexer,column_indexer] 表示同时对行列进行索引
1 2 3 4 5 6 7 8 9 10 11 12
In [1]: df = pd.DataFrame(np.random.randn(6, 4), index=list('abcdef'), columns=list('ABCD'))
In [1]: print(df) Out[1]: A B C D a 0.8418070.2862570.2902280.070770 b 0.118998-1.4356180.553285-0.554076 c -0.5105571.1257170.5941340.564652 d -0.1299810.022513-1.743066-0.735291 e -0.2376220.336630-1.006766-0.844039 f 0.7508350.4378000.5323660.145945
1 2 3 4 5 6 7
In [1]: df.loc['a'] Out[1]: A 0.841807 B 0.286257 C 0.290228 D 0.070770 Name: a, dtype: float64
1 2 3 4 5 6
In [1]: df.loc[['a', 'c', 'f']] Out[1]: A B C D a 0.8418070.2862570.2902280.070770 c -0.5105571.1257170.5941340.564652 f 0.7508350.4378000.5323660.145945
1 2 3 4 5 6 7 8 9
In [1]: df.loc[:, 'B': 'C'] Out[1]: B C a 0.2862570.290228 b -1.4356180.553285 c 1.1257170.594134 d 0.022513-1.743066 e 0.336630-1.006766 f 0.4378000.532366
1 2 3 4 5 6
In [1]: df.loc['d':, 'A':'C'] Out[1]: A B C d -0.1299810.022513-1.743066 e -0.2376220.336630-1.006766 f 0.7508350.4378000.532366
1 2 3 4 5 6 7 8 9 10 11 12 13
In [1]: df.loc['b'] > 0 Out[1]: A True B False C True D False Name: b, dtype: bool
In [1]: df.loc['b': 'c', df.loc['b'] > 0] Out[1]: A C b 0.1189980.553285 c -0.5105570.594134
1 2
In [1]: df.loc['a', 'A'] Out[1]: 0.8418070411036316
TODO: 此处添加切片超出范围时的处理
iloc 方法
iloc 方法是纯基于位置的索引方法,和 python 类似。可以是以下:
单个整数,用于匹配轴位置,例如 5
整数列表,例如 [4, 3, 0]
切片对象,例如 1:7(与 python 中的切片对象一致),仅:表示全部切片
布尔数组,数组的长度要和该维度的长度一致
可调用函数,函数的输入只有一个参数,代表当前 Series 或 DataFrame,函数的返回值需要是前面四种之一。
Series
1 2 3 4 5 6 7 8 9 10
In [1]: s = pd.Series(np.random.randn(5), index=list(range(0, 10, 2)))
In [1]: df['two'] In [1]: Ohio 1 Colorado 5 Utah 9 New York 13 Name: two, dtype: int32
In [1]: df[['three', 'one']] In [1]: three one Ohio 20 Colorado 64 Utah 108 New York 1412
In [1]: In [1]:
In [1]: In [1]:
行切片
1 2 3 4 5
In [1]: df[: 2] In [1]: one two three four Ohio 0123 Colorado 4567
布尔数组
1 2 3 4 5 6
In [1]: df[df['three'] > 5 In [1]: one two three four Colorado 4567 Utah 891011 New York 12131415
布尔型 DataFrame
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In [1]: df < 5 In [1]: one two three four Ohio TrueTrueTrueTrue Colorado TrueFalseFalseFalse Utah FalseFalseFalseFalse New York FalseFalseFalseFalse
In [1]: df[df < 5] In [1]: one two three four Ohio 0.01.02.03.0 Colorado 4.0 NaN NaN NaN Utah NaN NaN NaN NaN New York NaN NaN NaN NaN
属性运算 .
Pandas 支持使用属性运算 . 来直接访问 Series 的索引或 DataFram 的列。
You can use this access only if the index element is a valid Python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers. The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed. Similarly, the attribute will not be available if it conflicts with any of the following list: index, major_axis, minor_axis, items. In any of these cases, standard indexing will still work, e.g. s[‘1’], s[‘min’], and s[‘index’] will access the corresponding element or column.
1 2 3 4 5 6 7 8 9
In [1]: s.b In [1]: 1.0
In [1]: df.two In [1]: Ohio 1 Colorado 5 Utah 9 New York 13 Name: two, dtype: int32