近邻匹配
Python的TransBigData包提供了点与点、点与线的近邻匹配算法,下面的案例展示如何用TransBigData包进行点与点、点与线的近邻匹配。该方法使用的是KDTree算法,可查看wiki:https://en.wikipedia.org/wiki/K-d_tree,算法复杂度为o(log(n))
点与点匹配(DataFrame与DataFrame)
导入TransBigData包
import transbigdata as tbd
生成两个GeoDataFrame表,但它们只有经纬度列
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
dfA = gpd.GeoDataFrame([[1,2],[2,4],[2,6],
[2,10],[24,6],[21,6],
[22,6]],columns = ['lon1','lat1'])
dfB = gpd.GeoDataFrame([[1,3],[2,5],[2,2]],columns = ['lon','lat'])
使用tbd.ckdnearest进行点与点匹配,如果是DataFrame与DataFrame匹配(不含有地理信息),则需要指定前后两个表的经纬度列
transbigdata.ckdnearest(dfA_origin, dfB_origin, Aname=['lon', 'lat'], Bname=['lon', 'lat'])
输入两个DataFrame,分别指定经纬度列名,为表A匹配表B中最近点,并计算距离
输入
dfA_origin:DataFrame
表A
dfB_origin:DataFrame
表B
Aname:List
表A中经纬度列字段
Bname:List
表B中经纬度列字段
输出
gdf:DataFrame
为A匹配到B上最近点的表
tbd.ckdnearest(dfA,dfB,Aname=['lon1','lat1'],Bname=['lon','lat'])
#此时计算出的距离为经纬度换算实际距离
lon1 | lat1 | index | lon | lat | dist | |
---|---|---|---|---|---|---|
0 | 1 | 2 | 0 | 1 | 3 | 1.111949e+05 |
1 | 2 | 4 | 1 | 2 | 5 | 1.111949e+05 |
2 | 2 | 6 | 1 | 2 | 5 | 1.111949e+05 |
3 | 2 | 10 | 1 | 2 | 5 | 5.559746e+05 |
4 | 24 | 6 | 1 | 2 | 5 | 2.437393e+06 |
5 | 21 | 6 | 1 | 2 | 5 | 2.105798e+06 |
6 | 22 | 6 | 1 | 2 | 5 | 2.216318e+06 |
点与点匹配(GeoDataFrame与GeoDataFrame)
将A表B表变为含有点信息的GeoDataFrame
dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = gpd.points_from_xy(dfB['lon'],dfB['lat'])
使用tbd.ckdnearest_point进行点与点匹配
transbigdata.ckdnearest_point(gdA, gdB)
输入两个GeoDataFrame,gdfA、gdfB均为点,该方法会为gdfA表连接上gdfB中最近的点,并添加距离字段dsit
输入
gdA:GeoDataFrame
表A,点要素
gdB:GeoDataFrame
表B,点要素
输出
gdf:GeoDataFrame
为A匹配到B上最近点的表
tbd.ckdnearest_point(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 | lat1 | geometry_x | dist | index | lon | lat | geometry_y | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | POINT (1.00000 2.00000) | 1.000000 | 0 | 1 | 3 | POINT (1.00000 3.00000) |
1 | 2 | 4 | POINT (2.00000 4.00000) | 1.000000 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
2 | 2 | 6 | POINT (2.00000 6.00000) | 1.000000 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
3 | 2 | 10 | POINT (2.00000 10.00000) | 5.000000 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
4 | 24 | 6 | POINT (24.00000 6.00000) | 22.022716 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
5 | 21 | 6 | POINT (21.00000 6.00000) | 19.026298 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
6 | 22 | 6 | POINT (22.00000 6.00000) | 20.024984 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
点与线匹配(GeoDataFrame与GeoDataFrame)
将A表变为地理点,B表为线
dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = [LineString([[1,1],[1.5,2.5],[3.2,4]]),
LineString([[1,0],[1.5,0],[4,0]]),
LineString([[1,-1],[1.5,-2],[4,-4]])]
dfB.plot()
transbigdata.ckdnearest_line(gdfA, gdfB)
输入两个GeoDataFrame,其中gdfA为点,gdfB为线,该方法会为gdfA表连接上gdfB中最近的线,并添加距离字段dsit
输入
gdA:GeoDataFrame
表A,点要素
gdB:GeoDataFrame
表B,线要素
输出
gdf:GeoDataFrame
为A匹配到B中最近的线
用tbd.ckdnearest_line可以实现点匹配线,其原理是将线中的折点提取,然后使用点匹配点。
tbd.ckdnearest_line(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 | lat1 | geometry_x | dist | index | lon | lat | geometry_y | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | POINT (1.00000 2.00000) | 0.707107 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
1 | 2 | 4 | POINT (2.00000 4.00000) | 1.200000 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
2 | 2 | 6 | POINT (2.00000 6.00000) | 2.332381 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
3 | 2 | 10 | POINT (2.00000 10.00000) | 6.118823 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
4 | 21 | 6 | POINT (21.00000 6.00000) | 17.912007 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
5 | 22 | 6 | POINT (22.00000 6.00000) | 18.906084 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
6 | 24 | 6 | POINT (24.00000 6.00000) | 20.880613 | 1 | 2 | 5 | LINESTRING (1.00000 0.00000, 1.50000 0.00000, ... |