Pandas提供了各种工具(功能),可以轻松地将Series,DataFrame和Panel对象组合在一起。
pd.concat(objs,axis=0,join='outer',join_axes=None,
ignore_index=False)
其中,
objs
- 这是 Series , DataFrame 或 Panel 对象的序列或映射。axis
-{0,1,...}
,默认为0
,这是连接的轴。join
-{'inner', 'outer'}
,默认inner
。如何处理其他轴上的索引。联合的外部和交叉的内部。ignore_index
− 布尔值,默认为False
。如果指定为True
,则不要使用连接轴上的索引值。结果轴将被标记为:0, ..., n-1
。join_axes
- 这是 Index 对象的列表。用于其他(n-1)
轴的特定索引,而不是执行内部/外部集逻辑。
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two])
print(rs)
Name subject_id Marks_scored 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
假设想把特定的键与每个碎片的 DataFrame 关联起来。可以通过使用键参数来实现这一点 -
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two],keys=['x','y'])
print(rs)
Name subject_id Marks_scored x 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 y 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
结果的索引是重复的; 每个索引重复。
如果想要生成的对象必须遵循自己的索引,请将 ignore_index
设置为 True
。参考以下示例代码:
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two],keys=['x','y'],ignore_index=True)
print(rs)
Name subject_id Marks_scored 0 Alex sub1 98 1 Amy sub2 90 2 Allen sub4 87 3 Alice sub6 69 4 Ayoung sub5 78 5 Billy sub2 89 6 Brian sub4 80 7 Bran sub3 79 8 Bryce sub6 97 9 Betty sub5 88
索引完全改变,键也被覆盖。如果需要沿 axis=1
添加两个对象,则会添加新列。
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
rs = pd.concat([one,two],axis=1)
print(rs)
Name subject_id Marks_scored Name subject_id Marks_scored 1 Alex sub1 98 Billy sub2 89 2 Amy sub2 90 Brian sub4 80 3 Allen sub4 87 Bran sub3 79 4 Alice sub6 69 Bryce sub6 97 5 Ayoung sub5 78 Betty sub5 88
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
# rs = one.append(two)
rs = pd.concat([one, two])
print(rs)
Name subject_id Marks_scored 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
append()
函数也可以带多个对象:
import pandas as pd
one = pd.DataFrame({
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5'],
'Marks_scored':[98,90,87,69,78]},
index=[1,2,3,4,5])
two = pd.DataFrame({
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5'],
'Marks_scored':[89,80,79,97,88]},
index=[1,2,3,4,5])
# rs = one.append([two,one,two])
rs = pd.concat([one, two, one, two])
print(rs)
Name subject_id Marks_scored 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88 1 Alex sub1 98 2 Amy sub2 90 3 Allen sub4 87 4 Alice sub6 69 5 Ayoung sub5 78 1 Billy sub2 89 2 Brian sub4 80 3 Bran sub3 79 4 Bryce sub6 97 5 Betty sub5 88
import pandas as pd
# print(pd.datetime.now())
时间戳数据是时间序列数据的最基本类型,它将数值与时间点相关联。 对于Pandas对象来说,意味着使用时间点。举个例子 -
import pandas as pd
time = pd.Timestamp('2018-11-01')
print(time)
2018-11-01 00:00:00
也可以转换整数或浮动时期。这些的默认单位是纳秒(因为这些是如何存储时间戳的)。 然而,时代往往存储在另一个可以指定的单元中。 再举一个例子:
import pandas as pd
time = pd.Timestamp(1588686880,unit='s')
print(time)
2020-05-05 13:54:40
import pandas as pd
time = pd.date_range("12:00", "23:59", freq="30min").time
print(time)
[datetime.time(12, 0) datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30) datetime.time(14, 0) datetime.time(14, 30) datetime.time(15, 0) datetime.time(15, 30) datetime.time(16, 0) datetime.time(16, 30) datetime.time(17, 0) datetime.time(17, 30) datetime.time(18, 0) datetime.time(18, 30) datetime.time(19, 0) datetime.time(19, 30) datetime.time(20, 0) datetime.time(20, 30) datetime.time(21, 0) datetime.time(21, 30) datetime.time(22, 0) datetime.time(22, 30) datetime.time(23, 0) datetime.time(23, 30)]
import pandas as pd
time = pd.date_range("12:00", "23:59", freq="H").time
print(time)
[datetime.time(12, 0) datetime.time(13, 0) datetime.time(14, 0) datetime.time(15, 0) datetime.time(16, 0) datetime.time(17, 0) datetime.time(18, 0) datetime.time(19, 0) datetime.time(20, 0) datetime.time(21, 0) datetime.time(22, 0) datetime.time(23, 0)]
/tmp/ipykernel_5344/1441772809.py:3: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead. time = pd.date_range("12:00", "23:59", freq="H").time
import pandas as pd
time = pd.to_datetime(pd.Series(['Jul 31, 2009','2019-10-10', None]))
print(time)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[13], line 3 1 import pandas as pd ----> 3 time = pd.to_datetime(pd.Series(['Jul 31, 2009','2019-10-10', None])) 4 print(time) File /opt/conda/lib/python3.12/site-packages/pandas/core/tools/datetimes.py:1067, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache) 1065 result = arg.map(cache_array) 1066 else: -> 1067 values = convert_listlike(arg._values, format) 1068 result = arg._constructor(values, index=arg.index, name=arg.name) 1069 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)): File /opt/conda/lib/python3.12/site-packages/pandas/core/tools/datetimes.py:433, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact) 431 # `format` could be inferred, or user didn't ask for mixed-format parsing. 432 if format is not None and format != "mixed": --> 433 return _array_strptime_with_fallback(arg, name, utc, format, exact, errors) 435 result, tz_parsed = objects_to_datetime64( 436 arg, 437 dayfirst=dayfirst, (...) 441 allow_object=True, 442 ) 444 if tz_parsed is not None: 445 # We can take a shortcut since the datetime64 numpy array 446 # is in UTC File /opt/conda/lib/python3.12/site-packages/pandas/core/tools/datetimes.py:467, in _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors) 456 def _array_strptime_with_fallback( 457 arg, 458 name, (...) 462 errors: str, 463 ) -> Index: 464 """ 465 Call array_strptime, with fallback behavior depending on 'errors'. 466 """ --> 467 result, tz_out = array_strptime(arg, fmt, exact=exact, errors=errors, utc=utc) 468 if tz_out is not None: 469 unit = np.datetime_data(result.dtype)[0] File strptime.pyx:501, in pandas._libs.tslibs.strptime.array_strptime() File strptime.pyx:451, in pandas._libs.tslibs.strptime.array_strptime() File strptime.pyx:583, in pandas._libs.tslibs.strptime._parse_with_format() ValueError: time data "2019-10-10" doesn't match format "%b %d, %Y", at position 1. You might want to try: - passing `format` if your strings have a consistent format; - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format; - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.
NaT
表示不是一个时间的值(相当于 NaN
)
举一个例子:
time = pd.to_datetime(['2009/11/23', '2019.12.31', None])
time