重新索引会更改DataFrame的行标签和列标签。重新索引意味着符合数据以匹配特定轴上的一组给定的标签。
可以通过索引来实现多个操作:
- 重新排序现有数据以匹配一组新的标签。
- 在没有标签数据的标签位置插入缺失值(
NA
)标记。
示例
import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({
'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
'x': np.linspace(0,stop=N-1,num=N),
'y': np.random.rand(N),
'C': np.random.choice(['Low','Medium','High'],N).tolist(),
'D': np.random.normal(100, 10, size=(N)).tolist()
})
对DataFrame进行重新索引
df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B'])
df_reindexed
A | C | B | |
---|---|---|---|
0 | 2016-01-01 | High | NaN |
2 | 2016-01-03 | Low | NaN |
5 | 2016-01-06 | Low | NaN |
df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
df1 = df1.reindex_like(df2)
df1
col1 | col2 | col3 | |
---|---|---|---|
0 | 1.034875 | -1.394084 | 0.629550 |
1 | 0.057654 | 0.669781 | 0.207427 |
2 | -0.884713 | -0.677729 | -1.437965 |
3 | -0.656413 | -0.959881 | 0.561939 |
4 | -0.165767 | -1.438727 | 0.207890 |
5 | -0.202938 | 0.714653 | 0.182302 |
6 | 0.421432 | -0.011576 | 1.307452 |
注意 - 在这里, df1
数据帧(DataFrame)被更改并重新编号,如 df2
。
列名称应该匹配,否则将为整个列标签添加 NAN
。
示例
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
填充缺失值(NaN)
print (df2.reindex_like(df1))
col1 col2 col3 0 1.503888 -0.940165 1.908222 1 1.065565 0.021555 0.999087 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN
使用前向填充法处理缺失值
print ("Data Frame with Forward Fill:")
df2.reindex_like(df1,method='ffill')
Data Frame with Forward Fill:
col1 | col2 | col3 | |
---|---|---|---|
0 | 1.503888 | -0.940165 | 1.908222 |
1 | 1.065565 | 0.021555 | 0.999087 |
2 | 1.065565 | 0.021555 | 0.999087 |
3 | 1.065565 | 0.021555 | 0.999087 |
4 | 1.065565 | 0.021555 | 0.999087 |
5 | 1.065565 | 0.021555 | 0.999087 |
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
填充缺失值(NaN)
print (df2.reindex_like(df1))
col1 col2 col3 0 0.003987 -0.167001 2.161461 1 -0.245857 -0.559355 -0.103263 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN
用前序有效值填充缺失值
print ("Data Frame with Forward Fill limiting to 1:")
df2.reindex_like(df1,method='ffill',limit=1)
Data Frame with Forward Fill limiting to 1:
col1 | col2 | col3 | |
---|---|---|---|
0 | 0.003987 | -0.167001 | 2.161461 |
1 | -0.245857 | -0.559355 | -0.103263 |
2 | -0.245857 | -0.559355 | -0.103263 |
3 | NaN | NaN | NaN |
4 | NaN | NaN | NaN |
5 | NaN | NaN | NaN |
注意:只有第7行由前6行填充。 然后,其它行按原样保留。
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print (df1)
col1 col2 col3 0 -0.553132 1.346574 0.729148 1 -0.007324 0.408146 -0.674935 2 -1.182618 1.130736 -0.108578 3 0.422136 0.228542 0.074224 4 -1.090094 -0.210106 0.598660 5 -0.754725 0.949153 -1.405165
print ("After renaming the rows and columns:")
After renaming the rows and columns:
print (df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},
index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))
c1 c2 col3 apple -0.553132 1.346574 0.729148 banana -0.007324 0.408146 -0.674935 durian -1.182618 1.130736 -0.108578 3 0.422136 0.228542 0.074224 4 -1.090094 -0.210106 0.598660 5 -0.754725 0.949153 -1.405165
rename()
方法提供了一个 inplace
命名参数,默认为 False
并复制底层数据。
指定参数 inplace = True
则表示将数据重命名。