几何操作
geopandas在shapely库中提供了所有几何操作的工具。
请注意,所有用于创建不同空间数据集(如创建交叉点或差异)新形状的集合理论工具的文档可以在设置操作页面上查找。
创建方法
GeoSeries.buffer(distance, resolution=16)
返回每个几何对象的给定距离内的所有点的GeoSeries。
GeoSeries.boundary
返回每个几何的集合理论边界的低维对象GeoSeries。
GeoSeries.centroid
返回每个几何质心的点GeoSeries。
GeoSeries.convex_hull
返回包含每个对象所有点的最小凸多边形的几何体的GeoSeries,除非对象中的点数小于三。对于两个点,凸多边形返货的是一个LineString;若为1,则表示一个点。
GeoSeries.envelope
返回包含每个对象点的最小矩形(平行于坐标轴的边)的几何体的GeoSeries。
GeoSeries.simplify(tolerance, preserve_topology=True)
返回包含每个对象简化表示的GeoSeries。
GeoSeries.unary_union
返回包含GeoSeries中所有几何并集的GeoSeries。
仿射变换
GeoSeries.rotate(self, angle, origin='center', use_radians=False)
旋转GeoSeries坐标。
GeoSeries.scale(self, xfact=1.0, yfact=1.0, zfact=1.0, origin='center')
沿每个(x,y,z)维度缩放GeoSeries几何。
GeoSeries.skew(self, angle, origin='center', use_radians=False)
通过沿x和y维度的角度剪切或倾斜GeoSeries的几何图形。
GeoSeries.translate(self, angle, origin='center', use_radians=False)
移动GeoSeries的坐标。
几何操作的示例
%matplotlib inline
import geopandas as gpd
from geopandas import GeoDataFrame
from shapely.geometry import Polygon
from geopandas import GeoSeries
p1 = Polygon([(0, 0), (1, 0), (1, 1)])
p2 = Polygon([(0, 0), (1, 0), (1, 1), (0, 1)])
p3 = Polygon([(2, 0), (3, 0), (3, 1), (2, 1)])
g = GeoSeries([p1, p2, p3])
返回正常的pandas对象。GeoSeries的area属性将返回包含GeoSeries中每个项目的区域的pandas.Series:
print (g.area)
0 0.5 1 1.0 2 1.0 dtype: float64
返回GeoPandas对象:
g.buffer(0.5)
0 POLYGON ((-0.35355 0.35355, 0.64645 1.35355, 0... 1 POLYGON ((-0.5 0, -0.5 1, -0.49759 1.04901, -0... 2 POLYGON ((1.5 0, 1.5 1, 1.50241 1.04901, 1.509... dtype: geometry
GeoPandas使用的是descartes方法生成matplotlib图。要生成我们自己的GeoSeries图,请使用:
g.plot()
<Axes: >
为了演示更复杂的操作,我们将生成一个包含2000个随机点的GeoSeries:
from shapely.geometry import Point
import numpy as np
xmin, xmax, ymin, ymax = 900000, 1080000, 120000, 280000
xc = (xmax - xmin) * np.random.random(2000) + xmin
yc = (ymax - ymin) * np.random.random(2000) + ymin
pts = GeoSeries([Point(x, y) for x, y in zip(xc, yc)])
现在在每个点周围画一个固定半径的圆:
circles = pts.buffer(2000)
我们可以将这些圆形折叠成一个单一的多边形
mp = circles.unary_union
/tmp/ipykernel_772/3785283087.py:1: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead. mp = circles.unary_union
from shapely.geometry import Polygon
polys1 = gpd.GeoSeries([Polygon([(0,0), (2,0), (2,2), (0,2)]),
Polygon([(2,2), (4,2), (4,4), (2,4)])])
polys2 = gpd.GeoSeries([Polygon([(1,1), (3,1), (3,3), (1,3)]),
Polygon([(3,3), (5,3), (5,5), (3,5)])])
df1 = gpd.GeoDataFrame({'geometry': polys1, 'df1':[1,2]})
df2 = gpd.GeoDataFrame({'geometry': polys2, 'df2':[1,2]})
这两个GeoDataFrames有一些重叠区域:
ax = df1.plot(color='red')
df2.plot(ax=ax, color='green')
<Axes: >
上面的例子说明是的覆盖模式。覆盖函数是通过覆盖GeoDataFrames的方法来确定所有单个几何的集合。此结果覆盖由两个输入GeoDataFrames覆盖的区域组成,保留着两个GeoDataFrames组合边界定义的唯一区域。
当使用how ='union'时,将返回所有可能的几何体:
res_union = gpd.overlay(df1, df2, how='union')
res_union
df1 | df2 | geometry | |
---|---|---|---|
0 | 1.0 | 1.0 | POLYGON ((2 2, 2 1, 1 1, 1 2, 2 2)) |
1 | 2.0 | 1.0 | POLYGON ((2 2, 2 3, 3 3, 3 2, 2 2)) |
2 | 2.0 | 2.0 | POLYGON ((4 4, 4 3, 3 3, 3 4, 4 4)) |
3 | 1.0 | NaN | POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0)) |
4 | 2.0 | NaN | MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4... |
5 | NaN | 1.0 | MULTIPOLYGON (((2 3, 2 2, 1 2, 1 3, 2 3)), ((3... |
6 | NaN | 2.0 | POLYGON ((3 5, 5 5, 5 3, 4 3, 4 4, 3 4, 3 5)) |
ax = res_union.plot()
df1.plot(ax=ax, facecolor='none')
df2.plot(ax=ax, facecolor='none')
<Axes: >
其他操作将返回那些几何的不同子集。使用how ='intersection'方法,返回的是两个GeoDataFrames包含的几何体:
res_intersection = gpd.overlay(df1, df2, how='intersection')
res_intersection
df1 | df2 | geometry | |
---|---|---|---|
0 | 1 | 1 | POLYGON ((2 2, 2 1, 1 1, 1 2, 2 2)) |
1 | 2 | 1 | POLYGON ((2 2, 2 3, 3 3, 3 2, 2 2)) |
2 | 2 | 2 | POLYGON ((4 4, 4 3, 3 3, 3 4, 4 4)) |
ax = res_intersection.plot()
df1.plot(ax=ax, facecolor='none')
df2.plot(ax=ax, facecolor='none')
<Axes: >
how ='symmetric_difference'与'intersection'相反,返回的几何只是GeoDataFrames其中之一的一部分,并不是两者的一部分:
res_symdiff = gpd.overlay(df1, df2, how='symmetric_difference')
res_symdiff
df1 | df2 | geometry | |
---|---|---|---|
0 | 1.0 | NaN | POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0)) |
1 | 2.0 | NaN | MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4... |
2 | NaN | 1.0 | MULTIPOLYGON (((2 3, 2 2, 1 2, 1 3, 2 3)), ((3... |
3 | NaN | 2.0 | POLYGON ((3 5, 5 5, 5 3, 4 3, 4 4, 3 4, 3 5)) |
ax = res_symdiff.plot()
df1.plot(ax=ax, facecolor='none');
df2.plot(ax=ax, facecolor='none');
作为df1的一部分但此部分并不包含在df2中,要获取这样的几何体,可以使用how ='difference'方法:
res_difference = gpd.overlay(df1, df2, how='difference')
res_difference
geometry | df1 | |
---|---|---|
0 | POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0)) | 1 |
1 | MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4... | 2 |
ax = res_difference.plot()
df1.plot(ax=ax, facecolor='none')
df2.plot(ax=ax, facecolor='none')
<Axes: >
最后,使用how ='identity'方法生成的,结果由df1的表面组成。若想获得df2覆盖df1的几何体,可使用以下方法:
res_identity = gpd.overlay(df1, df2, how='identity')
res_identity
df1 | df2 | geometry | |
---|---|---|---|
0 | 1.0 | 1.0 | POLYGON ((2 2, 2 1, 1 1, 1 2, 2 2)) |
1 | 2.0 | 1.0 | POLYGON ((2 2, 2 3, 3 3, 3 2, 2 2)) |
2 | 2.0 | 2.0 | POLYGON ((4 4, 4 3, 3 3, 3 4, 4 4)) |
3 | 1.0 | NaN | POLYGON ((2 0, 0 0, 0 2, 1 2, 1 1, 2 1, 2 0)) |
4 | 2.0 | NaN | MULTIPOLYGON (((3 4, 3 3, 2 3, 2 4, 3 4)), ((4... |
ax = res_identity.plot()
df1.plot(ax=ax, facecolor='none');
df2.plot(ax=ax, facecolor='none');
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
capitals = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
# Select South Amarica and some columns
countries = world[world['continent'] == "South America"]
countries = countries[['geometry', 'name']]
# Project to crs that uses meters as distance measure
countries = countries.to_crs('epsg:3395')
capitals = capitals.to_crs('epsg:3395')
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[21], line 1 ----> 1 world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')) 3 capitals = gpd.read_file(gpd.datasets.get_path('naturalearth_cities')) 5 # Select South Amarica and some columns File /opt/conda/lib/python3.12/site-packages/geopandas/datasets/__init__.py:18, in get_path(dataset) 12 error_msg = ( 13 "The geopandas.dataset has been deprecated and was removed in GeoPandas " 14 f"1.0. You can get the original '{dataset}' data from " 15 f"{ne_message if 'natural' in dataset else nybb_message}" 16 ) 17 if dataset in _prev_available: ---> 18 raise AttributeError(error_msg) 19 else: 20 error_msg = ( 21 "The geopandas.dataset has been deprecated and " 22 "was removed in GeoPandas 1.0. New sample datasets are now available " 23 "in the geodatasets package (https://geodatasets.readthedocs.io/en/latest/)" 24 ) AttributeError: The geopandas.dataset has been deprecated and was removed in GeoPandas 1.0. You can get the original 'naturalearth_lowres' data from https://www.naturalearthdata.com/downloads/110m-cultural-vectors/.
为了说明overlay 功能,我们来假设以下情况,使用国家的GeoDataFrame和大写的GeoDataFrame来识别每个国家的“核心”部分(定义在首都500km内的区域)。
# Look at countries:
countries.plot();
# Now buffer cities to find area within 500km.
# Check CRS -- World Mercator, units of meters.
capitals.crs
# make 500km buffer
capitals['geometry']= capitals.buffer(500000)
capitals.plot()
仅选择距离首都500公里内的国家部分,我们指定的如何选项是“intersect”,这将创建一组新的多边形,其中这两个层是重叠的:
country_cores = gpd.overlay(countries, capitals, how='intersection')
country_cores.plot();
更改“方式”选项可允许不同类型的重叠操作。例如,我们对远离国家首都的部分感兴趣,可以计算两者的差异。
country_peripheries = gpd.overlay(countries, capitals, how='difference')
country_peripheries.plot();
要素融合
通常情况下,我们发现自己使用的空间数据比我们需要的粒度更细。例如,我们有国家地方单位的数据,但我们感兴趣的是研究国家层面的模式。
在非空间设置中,我们使用groupby函数聚合数据。但是使用空间数据时,可能会需要一个特殊的工具,也可以聚合几何特征。在 GeoPandas库中,该功能由dissolve函数提供。
dissolve可以做三件事情:
- 给定组中的所有几何图形,并一起溶解成单个几何特征(使用unary_union方法);
- 聚合组中的所有数据行,使用的是groupby.aggregate()方法;
- 将以上这两种方法结合。
dissolve示例
假设我们对研究大陆感兴趣,但我们只有国家级数据。我们可以轻松地将其转换为大陆级数据集。
首先,我们只需要大陆形状和名称。默认情况下,dissolve将'first'输出到groupby.aggregate。
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world[['continent', 'geometry']]
continents = world.dissolve(by='continent')
continents.plot()
continents.head()
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
# For attribute join
country_shapes = world[['geometry', 'iso_a3']]
country_names = world[['name', 'iso_a3']]
# For spatial join
countries = world[['geometry', 'name']]
countries = countries.rename(columns={'name':'country'})
# country_shapes` is GeoDataFrame with country shapes and iso codes
country_shapes.head()
country_names.head()
country_shapes = country_shapes.merge(country_names, on='iso_a3')
country_shapes.head()
# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
countries.head()
cities.head()
cities_with_country = gpd.sjoin(cities, countries, how="inner", predicate='intersects')
cities_with_country.head()