到目前为止,我们了解了三种Pandas数据结构以及如何创建它们。接下来将主要关注数据帧(DataFrame)对象,因为它在实时数据处理中非常重要,并且还讨论其他数据结构。
编号 | 属性或方法 | 描述 |
---|---|---|
1 | axes |
返回行轴标签列表。 |
2 | dtype |
返回对象的数据类型(dtype )。 |
3 | empty |
如果系列为空,则返回True 。 |
4 | ndim |
返回底层数据的维数,默认定义:1 。 |
5 | size |
返回基础数据中的元素数。 |
6 | values |
将系列作为ndarray 返回。 |
7 | head() |
返回前n 行。 |
8 | tail() |
返回最后n 行。 |
现在创建一个系列并演示如何使用上面所有列出的属性操作。
import pandas as pd
import numpy as np
#Create a series with 100 random numbers
s = pd.Series(np.random.randn(4))
print(s)
0 -0.174390 1 -0.506319 2 -1.082130 3 0.101745 dtype: float64
返回系列的标签列表。参考以下示例代码 -
import pandas as pd
import numpy as np
#Create a series with 100 random numbers
s = pd.Series(np.random.randn(4))
print ("The axes are:")
print(s.axes)
The axes are: [RangeIndex(start=0, stop=4, step=1)]
上述结果是从0到5的值列表的紧凑格式,即:[0,1,2,3,4]。
s = pd.Series(np.random.randn(4))
print("Is the Object empty?", s.empty)
Is the Object empty? False
s = pd.Series(np.random.randn(4))
print(s)
print("The dimensions of the object:", s.ndim)
0 1.408732 1 1.070586 2 0.065844 3 0.471072 dtype: float64 The dimensions of the object: 1
s = pd.Series(np.random.randn(2))
print(s)
print("The size of the object:", s.size)
0 -1.851026 1 0.698335 dtype: float64 The size of the object: 2
s = pd.Series(np.random.randn(4))
print(s)
print("The actual data series is:", s.values)
0 0.507342 1 0.040763 2 0.159354 3 -1.610987 dtype: float64 The actual data series is: [ 0.50734157 0.04076283 0.1593538 -1.61098664]
要查看Series或DataFrame对象的小样本,请使用 head()
和 tail()
方法。
head()
返回前n行(观察索引值)。要显示的元素的默认数量为5,但可以传递自定义这个数字值。
Create a series with 4 random numbers
s = pd.Series(np.random.randn(4))
print("The original series is:")
print(s)
print("The first two rows of the data series:")
print(s.head(2))
The original series is: 0 -0.313015 1 -0.274052 2 0.739818 3 -0.723503 dtype: float64 The first two rows of the data series: 0 -0.313015 1 -0.274052 dtype: float64
tail()
返回最后n行(观察索引值)。 要显示的元素的默认数量为 5
,但可以传递自定义数字值。参考以下示例代码 -
Create a series with 4 random numbers。
s = pd.Series(np.random.randn(4))
print("The original series is:")
print(s)
print("The last two rows of the data series:")
print(s.tail(2))
The original series is: 0 0.945928 1 0.413040 2 1.891930 3 0.399307 dtype: float64 The last two rows of the data series: 2 1.891930 3 0.399307 dtype: float64
执行上面示例代码,得到以下结果 -
编号 | 属性或方法 | 描述 |
---|---|---|
1 | T |
转置行和列。 |
2 | axes |
返回一个列,行轴标签和列轴标签作为唯一的成员。 |
3 | dtypes |
返回此对象中的数据类型(dtypes )。 |
4 | empty |
如果NDFrame 完全为空[无项目],则返回为True ; 如果任何轴的长度为0 。 |
5 | ndim |
轴/数组维度大小。 |
6 | shape |
返回表示DataFrame 的维度的元组。 |
7 | size |
NDFrame 中的元素数。 |
8 | values |
NDFrame的Numpy表示。 |
9 | head() |
返回开头前n 行。 |
10 | tail() |
返回最后n 行。 |
下面来看看如何创建一个DataFrame并使用上述属性和方法。
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
Create a DataFrame:
df = pd.DataFrame(d)
print("Our data series is:")
print(df)
Our data series is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80
import pandas as pd
import numpy as np
# Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
# Create a DataFrame
df = pd.DataFrame(d)
print ("The transpose of the data series is:")
print (df.T)
The transpose of the data series is: 0 1 2 3 4 5 6 Name Tom James Ricky Vin Steve Minsu Jack Age 25 26 25 23 30 29 23 Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Row axis labels and column axis labels are:")
print( df.axes)
Row axis labels and column axis labels are: [RangeIndex(start=0, stop=7, step=1), Index(['Name', 'Age', 'Rating'], dtype='object')]
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print("The data types of each column are:")
print(df.dtypes)
The data types of each column are: Name object Age int64 Rating float64 dtype: object
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Is the object empty?")
print( df.empty)
Is the object empty? False
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The dimension of the object is:")
print (df.ndim)
Our object is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 The dimension of the object is: 2
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The shape of the object is:")
print (df.shape)
Our object is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 The shape of the object is: (7, 3)
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The total number of elements in our object is:")
print (df.size)
Our object is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 The total number of elements in our object is: 21
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our object is:")
print (df)
print ("The actual data in our data frame is:")
print (df.values)
Our object is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 The actual data in our data frame is: [['Tom' 25 4.23] ['James' 26 3.24] ['Ricky' 25 3.98] ['Vin' 23 2.56] ['Steve' 30 3.2] ['Minsu' 29 4.6] ['Jack' 23 3.8]]
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The first two rows of the data frame is:")
print (df.head(2))
Our data frame is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 The first two rows of the data frame is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24
tail()返回最后n行(观察索引值)。显示元素的默认数量为5,但可以传递自定义数字值。
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print (df)
print ("The last two rows of the data frame is:")
print (df.tail(2))
Our data frame is: Name Age Rating 0 Tom 25 4.23 1 James 26 3.24 2 Ricky 25 3.98 3 Vin 23 2.56 4 Steve 30 3.20 5 Minsu 29 4.60 6 Jack 23 3.80 The last two rows of the data frame is: Name Age Rating 5 Minsu 29 4.6 6 Jack 23 3.8