Python实现点与点、点与线的批量近邻匹配(TransBigData)

近邻匹配

Python的TransBigData包提供了点与点、点与线的近邻匹配算法,下面的案例展示如何用TransBigData包进行点与点、点与线的近邻匹配。该方法使用的是KDTree算法,可查看wiki:https://en.wikipedia.org/wiki/K-d_tree,算法复杂度为o(log(n))

点与点匹配(DataFrame与DataFrame)

导入TransBigData包

import transbigdata as tbd

生成两个GeoDataFrame表,但它们只有经纬度列

import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
dfA = gpd.GeoDataFrame([[1,2],[2,4],[2,6],
                        [2,10],[24,6],[21,6],
                        [22,6]],columns = ['lon1','lat1'])
dfB = gpd.GeoDataFrame([[1,3],[2,5],[2,2]],columns = ['lon','lat'])

使用tbd.ckdnearest进行点与点匹配,如果是DataFrame与DataFrame匹配(不含有地理信息),则需要指定前后两个表的经纬度列

transbigdata.ckdnearest(dfA_origindfB_originAname=['lon', 'lat']Bname=['lon', 'lat'])

输入两个DataFrame,分别指定经纬度列名,为表A匹配表B中最近点,并计算距离

输入

dfA_origin:DataFrame

        表A

dfB_origin:DataFrame

        表B

Aname:List

        表A中经纬度列字段

Bname:List

        表B中经纬度列字段

输出

gdf:DataFrame

        为A匹配到B上最近点的表

tbd.ckdnearest(dfA,dfB,Aname=['lon1','lat1'],Bname=['lon','lat'])
#此时计算出的距离为经纬度换算实际距离
lon1 lat1 index lon lat dist
0 1 2 0 1 3 1.111949e+05
1 2 4 1 2 5 1.111949e+05
2 2 6 1 2 5 1.111949e+05
3 2 10 1 2 5 5.559746e+05
4 24 6 1 2 5 2.437393e+06
5 21 6 1 2 5 2.105798e+06
6 22 6 1 2 5 2.216318e+06

点与点匹配(GeoDataFrame与GeoDataFrame)

将A表B表变为含有点信息的GeoDataFrame

dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = gpd.points_from_xy(dfB['lon'],dfB['lat'])

使用tbd.ckdnearest_point进行点与点匹配

transbigdata.ckdnearest_point(gdAgdB)

输入两个GeoDataFrame,gdfA、gdfB均为点,该方法会为gdfA表连接上gdfB中最近的点,并添加距离字段dsit

输入

gdA:GeoDataFrame

        表A,点要素

gdB:GeoDataFrame

        表B,点要素

输出

gdf:GeoDataFrame

        为A匹配到B上最近点的表

tbd.ckdnearest_point(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 lat1 geometry_x dist index lon lat geometry_y
0 1 2 POINT (1.00000 2.00000) 1.000000 0 1 3 POINT (1.00000 3.00000)
1 2 4 POINT (2.00000 4.00000) 1.000000 1 2 5 POINT (2.00000 5.00000)
2 2 6 POINT (2.00000 6.00000) 1.000000 1 2 5 POINT (2.00000 5.00000)
3 2 10 POINT (2.00000 10.00000) 5.000000 1 2 5 POINT (2.00000 5.00000)
4 24 6 POINT (24.00000 6.00000) 22.022716 1 2 5 POINT (2.00000 5.00000)
5 21 6 POINT (21.00000 6.00000) 19.026298 1 2 5 POINT (2.00000 5.00000)
6 22 6 POINT (22.00000 6.00000) 20.024984 1 2 5 POINT (2.00000 5.00000)

点与线匹配(GeoDataFrame与GeoDataFrame)

将A表变为地理点,B表为线

dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = [LineString([[1,1],[1.5,2.5],[3.2,4]]),
                  LineString([[1,0],[1.5,0],[4,0]]),
                   LineString([[1,-1],[1.5,-2],[4,-4]])]
dfB.plot()

Python实现点与点、点与线的批量近邻匹配(TransBigData)

transbigdata.ckdnearest_line(gdfAgdfB)

输入两个GeoDataFrame,其中gdfA为点,gdfB为线,该方法会为gdfA表连接上gdfB中最近的线,并添加距离字段dsit

输入

gdA:GeoDataFrame

        表A,点要素

gdB:GeoDataFrame

        表B,线要素

输出

gdf:GeoDataFrame

        为A匹配到B中最近的线

用tbd.ckdnearest_line可以实现点匹配线,其原理是将线中的折点提取,然后使用点匹配点。

tbd.ckdnearest_line(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 lat1 geometry_x dist index lon lat geometry_y
0 1 2 POINT (1.00000 2.00000) 0.707107 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
1 2 4 POINT (2.00000 4.00000) 1.200000 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
2 2 6 POINT (2.00000 6.00000) 2.332381 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
3 2 10 POINT (2.00000 10.00000) 6.118823 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
4 21 6 POINT (21.00000 6.00000) 17.912007 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
5 22 6 POINT (22.00000 6.00000) 18.906084 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
6 24 6 POINT (24.00000 6.00000) 20.880613 1 2 5 LINESTRING (1.00000 0.00000, 1.50000 0.00000, ...

 

上一篇:Java使用DFA实现敏感词过滤


下一篇:Python线程池ThreadPool