重新索引会更改DataFrame的行标签和列标签。重新索引意味着符合数据以匹配特定轴上的一组给定的标签。
可以通过索引来实现多个操作 -
- 重新排序现有数据以匹配一组新的标签。
- 在没有标签数据的标签位置插入缺失值(
NA
)标记。
import pandas as pd
import numpy as np
N=20
df = pd.DataFrame({
'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
'x': np.linspace(0,stop=N-1,num=N),
'y': np.random.rand(N),
'C': np.random.choice(['Low','Medium','High'],N).tolist(),
'D': np.random.normal(100, 10, size=(N)).tolist()
})
#reindex the DataFrame
df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B'])
df_reindexed
A | C | B | |
---|---|---|---|
0 | 2016-01-01 | Low | NaN |
2 | 2016-01-03 | Low | NaN |
5 | 2016-01-06 | High | NaN |
df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
df1 = df1.reindex_like(df2)
df1
col1 | col2 | col3 | |
---|---|---|---|
0 | 0.172100 | -0.771364 | 0.207013 |
1 | 0.354693 | 0.075110 | 1.121217 |
2 | -1.264112 | -1.173026 | 0.063709 |
3 | -0.413122 | -0.101456 | 0.593423 |
4 | 1.081490 | 1.444647 | -1.352990 |
5 | -1.851898 | -2.028988 | -0.674064 |
6 | 1.198448 | 0.494240 | 0.214083 |
注意 - 在这里, df1
数据帧(DataFrame)被更改并重新编号,如 df2
。
列名称应该匹配,否则将为整个列标签添加 NAN
。
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print (df2.reindex_like(df1))
# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill:")
df2.reindex_like(df1,method='ffill')
col1 col2 col3 0 1.451996 1.356293 1.317706 1 -0.714818 1.038419 0.437779 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill:
col1 | col2 | col3 | |
---|---|---|---|
0 | 1.451996 | 1.356293 | 1.317706 |
1 | -0.714818 | 1.038419 | 0.437779 |
2 | -0.714818 | 1.038419 | 0.437779 |
3 | -0.714818 | 1.038419 | 0.437779 |
4 | -0.714818 | 1.038419 | 0.437779 |
5 | -0.714818 | 1.038419 | 0.437779 |
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
# Padding NAN's
print (df2.reindex_like(df1))
# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
df2.reindex_like(df1,method='ffill',limit=1)
col1 col2 col3 0 -1.728506 -0.899581 0.485258 1 0.164086 -1.090574 0.236180 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill limiting to 1:
col1 | col2 | col3 | |
---|---|---|---|
0 | -1.728506 | -0.899581 | 0.485258 |
1 | 0.164086 | -1.090574 | 0.236180 |
2 | 0.164086 | -1.090574 | 0.236180 |
3 | NaN | NaN | NaN |
4 | NaN | NaN | NaN |
5 | NaN | NaN | NaN |
注意 - 只有第7行由前6行填充。 然后,其它行按原样保留。
df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print (df1)
print ("After renaming the rows and columns:")
print (df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},
index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))
col1 col2 col3 0 -0.018863 -2.099289 0.469808 1 -1.190211 0.888139 0.491174 2 1.352684 -1.418455 0.044621 3 0.599615 -1.351258 0.982301 4 -1.204414 -0.134911 0.477062 5 0.108307 -0.130830 1.354150 After renaming the rows and columns: c1 c2 col3 apple -0.018863 -2.099289 0.469808 banana -1.190211 0.888139 0.491174 durian 1.352684 -1.418455 0.044621 3 0.599615 -1.351258 0.982301 4 -1.204414 -0.134911 0.477062 5 0.108307 -0.130830 1.354150
rename()
方法提供了一个 inplace
命名参数,默认为 False
并复制底层数据。
指定参数 inplace = True
则表示将数据重命名。