Pandas中已知警告和隐蔽陷阱是需要特别注意的地方。
与Pandas一起使用If/Truth语句
当尝试将某些东西转换成布尔值时,Pandas遵循了一个错误的惯例。 这种情况发生在使用布尔运算的。
目前还不清楚结果是什么。 如果它是真的,因为它不是 zerolength
?
错误,因为有错误的值? 目前还不清楚,Pandas
提出了一个 ValueError
-
import pandas as pd
if pd.Series([False, True, False]):
print ('I am True')
执行上面示例代码,得到以下结果:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
在 if
条件,它不清楚如何处理它。错误提示是否使用 None
或任何这些。
import pandas as pd
if pd.Series([False, True, False]).any():
print("I am any")
I am any
要在布尔上下文中评估单元素Pandas对象,请使用方法 .bool()
。
import pandas as pd
print (pd.Series([True]).bool())
True
/tmp/ipykernel_4713/3671257280.py:1: FutureWarning: Series.bool is now deprecated and will be removed in future version of pandas print (pd.Series([True]).bool())
import pandas as pd
s = pd.Series(range(5))
print (s==4)
0 False 1 False 2 False 3 False 4 True dtype: bool
import pandas as pd
s = pd.Series(list('abc'))
s = s.isin(['a', 'c', 'e'])
print (s)
0 True 1 False 2 True dtype: bool
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4),
columns=['one', 'two', 'three','four'],
index=list('abcdef'))
print (df)
one two three four a -0.768783 -1.322914 0.237845 -1.288651 b 0.231577 1.936367 -2.229683 0.763451 c -0.682158 0.183183 0.159201 0.040851 d -1.260157 -1.926052 -1.238070 1.212028 e 0.484787 -0.679275 -1.211464 -0.229575 f 1.286353 -1.313022 0.075444 0.653480
print ("=============================================")
=============================================
print (df.loc[['b', 'c', 'e']])
one two three four b 0.231577 1.936367 -2.229683 0.763451 c -0.682158 0.183183 0.159201 0.040851 e 0.484787 -0.679275 -1.211464 -0.229575
这当然在这种情况下完全等同于使用 reindex
方法。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4),
columns=['one', 'two', 'three','four'],
index=list('abcdef'))
print (df)
one two three four a -0.685815 -0.201484 -0.800849 1.532625 b -0.980875 0.948496 -1.073448 -1.378764 c -1.362332 -0.081194 0.893477 0.061041 d -0.624075 -0.978102 1.053991 -0.160999 e 1.128797 -1.887243 -0.525617 0.799780 f -1.533330 -0.465172 -2.517939 -0.085697
print("=============================================")
=============================================
print (df.reindex(['b', 'c', 'e']))
one two three four b -0.980875 0.948496 -1.073448 -1.378764 c -1.362332 -0.081194 0.893477 0.061041 e 1.128797 -1.887243 -0.525617 0.799780
有人可能会得出这样的结论,ix
和 reindex
是基于这个100%的等价物。
除了整数索引的情况,它是 true
。例如,上述操作可选地表示为:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4),
columns=['one', 'two', 'three','four'],
index=list('abcdef'))
print (df)
one two three four a 0.403643 -0.282671 -0.773929 -0.150883 b -0.960815 1.278617 0.970369 0.229878 c -0.108592 0.439917 1.098717 0.416205 d 0.570149 -1.064904 0.129353 0.361530 e 0.979274 0.107153 0.528962 1.159139 f -0.567982 -1.314972 -0.622541 -0.673303
print("=====================================")
=====================================
print (df.iloc[[1, 2, 3]])
one two three four b -0.960815 1.278617 0.970369 0.229878 c -0.108592 0.439917 1.098717 0.416205 d 0.570149 -1.064904 0.129353 0.361530
print("=====================================")
=====================================
print (df.reindex([1, 2, 3]))
one two three four 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN
重要的是要记住,reindex
只是严格的标签索引。
这可能会导致一些潜在的令人惊讶的结果,例如索引包含整数和字符串的病态情况。