Pandas提供了各种工具(功能),可以轻松地将Series,DataFrame和Panel对象组合在一起。
pd.concat(objs,axis=0,join='outer',join_axes=None,
ignore_index=False)
其中,
objs
- 这是 Series , DataFrame 或 Panel 对象的序列或映射。axis
-{0,1,...}
,默认为0
,这是连接的轴。join
-{'inner', 'outer'}
,默认inner
。如何处理其他轴上的索引。外部用于联合,内部用于交叉。ignore_index
− 布尔值,默认为False
。如果指定为True
, 则不要使用连接轴上的索引值。结果轴将被标记为:0, ..., n-1
。join_axes
- 这是 Index 对象的列表。用于其他(n-1)
轴的特定索引,而不是执行内部/外部集逻辑。
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two])
print(rs)
Name subject_id Marks_scored 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
假设想把特定的键与每个碎片的 DataFrame 关联起来。可以通过使用键参数来实现这一点:
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two],keys=['x','y'])
print(rs)
Name subject_id Marks_scored x 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 y 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
结果的索引被复制; 每个索引都是重复的。
如果想要生成的对象必须遵循自己的索引,请将 ignore_index
设置为 True
。
参考以下示例代码:
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two],keys=['x','y'],ignore_index=True)
print(rs)
Name subject_id Marks_scored 0 Alex sub1 98 1 Amy sub2 90 2 Allen sub4 87 3 Alice sub6 69 4 Ayoung sub5 78 5 Billy sub2 89 6 Brian sub4 80 7 Bran sub3 79 8 Bryce sub6 97 9 Betty sub5 88
索引完全改变,键也被覆盖。
如果需要沿 axis=1
添加两个对象,则会添加新列。
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two],axis=1)
print(rs)
Name subject_id Marks_scored Name subject_id Marks_scored 1 Alex sub1 98 Billy sub2 89 2 Amy sub2 90 Brian sub4 80 3 Allen sub4 87 Bran sub3 79 4 Alice sub6 69 Bryce sub6 97 5 Ayoung sub5 78 Betty sub5 88
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
# rs = one.append(two)
rs = pd.concat([one, two])
print(rs)
Name subject_id Marks_scored 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
append()
函数也可以带多个对象:
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
# rs = one.append([two,one,two])
rs = pd.concat([one, two, one, two])
print(rs)
Name subject_id Marks_scored 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
import pandas as pd
# print(pd.datetime.now())
时间戳数据是时间序列数据的最基本类型,它将数值与时间点相关联。 对于Pandas对象来说,意味着使用时间点。举个例子:
import pandas as pd
time = pd.Timestamp('2018-11-01')
print(time)
2018-11-01 00:00:00
也可以转换整数或浮动时期。这些的默认单位是纳秒(因为这些是如何存储时间戳的)。 然而,时代往往存储在另一个可以指定的单元中。 再举一个例子:
import pandas as pd
time = pd.Timestamp(1588686880,unit='s')
print(time)
2020-05-05 13:54:40
import pandas as pd
time = pd.date_range("12:00", "23:59", freq="30min").time
print(time)
[datetime.time(12, 0) datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30) datetime.time(14, 0) datetime.time(14, 30) datetime.time(15, 0) datetime.time(15, 30) datetime.time(16, 0) datetime.time(16, 30) datetime.time(17, 0) datetime.time(17, 30) datetime.time(18, 0) datetime.time(18, 30) datetime.time(19, 0) datetime.time(19, 30) datetime.time(20, 0) datetime.time(20, 30) datetime.time(21, 0) datetime.time(21, 30) datetime.time(22, 0) datetime.time(22, 30) datetime.time(23, 0) datetime.time(23, 30)]
import pandas as pd
time = pd.date_range("12:00", "23:59", freq="H").time
/tmp/ipykernel_2436/3543899096.py:1: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead. time = pd.date_range("12:00", "23:59", freq="H").time
print(time)
[datetime.time(12, 0) datetime.time(13, 0) datetime.time(14, 0) datetime.time(15, 0) datetime.time(16, 0) datetime.time(17, 0) datetime.time(18, 0) datetime.time(19, 0) datetime.time(20, 0) datetime.time(21, 0) datetime.time(22, 0) datetime.time(23, 0)]
import pandas as pd
time = pd.to_datetime(pd.Series(['Jul 31, 2009','2019-10-10', None]),
format='mixed')
print(time)
0 2009-07-31 1 2019-10-10 2 NaT dtype: datetime64[ns]
NaT
表示不是一个时间的值(相当于 NaN
)
举一个例子:
time = pd.to_datetime(['2009/11/23', '2019.12.31', None],
format='mixed')
time
DatetimeIndex(['2009-11-23', '2019-12-31', 'NaT'], dtype='datetime64[ns]', freq=None)