Title: | POLYGON LOOKUP USING KD TREES |
---|---|
Description: | Facilitates efficient polygon search using kd trees. Coordinate level spatial data can be aggregated to higher geographical identities like census blocks, ZIP codes or police district boundaries. This process requires mapping each point in the given data set to a particular identity of the desired geographical hierarchy. Unless efficient data structures are used, this can be a daunting task. The operation point.in.polygon() from the package sp is computationally expensive. Here, we exploit kd-trees as efficient nearest neighbor search algorithm to dramatically reduce the effective number of polygons being searched. |
Authors: | Markus Loecher <[email protected]> and Madhav Kumar <[email protected]> |
Maintainer: | Markus Loecher <[email protected]> |
License: | GPL |
Version: | 0.1.1 |
Built: | 2025-01-20 03:24:41 UTC |
Source: | https://github.com/cran/RapidPolygonLookup |
This package facilitates efficient polygon search using kd trees. Coordinate level spatial data can be aggregated to higher geographical identities like census blocks, ZIP codes or police district boundaries. This process requires mapping each point in the given data set to a particular identity of the desired geographical hierarchy. Unless efficient data structures are used, this can be a daunting task. The operation point.in.polygon() from the package sp is computationally expensive. Here, we exploit kd-trees as efficient nearest neighbor search algorithm to dramatically reduce the effective number of polygons being searched.
Package: | RapidPolygonLookup |
Type: | Package |
Title: | Polygon lookup using kd trees |
Version: | 0.1 |
Date: | 2013-11-18 |
Depends: | R(>= 2.10.0), sp, RANN, PBSmapping, RgoogleMaps |
Author: | "Markus Loecher, Berlin School of Economics and Law (BSEL)" <[email protected]>, Madhav Kumar <[email protected]> |
Maintainer: | "Markus Loecher" <[email protected]> |
License: | GPL |
LazyLoad: | yes |
Markus Loecher <[email protected]> and Madhav Kumar <[email protected]>
This function computes the bounding box for each polygon and adds this information to the list. The bounding boxes can be used in various applications. Our main motivation is for the massive PointsInPolygon search to exclude those polygons as candidates whose bounding box does not contain the current point.
AddRanges(poly.list)
AddRanges(poly.list)
poly.list |
polygon list with three elements: data, polys, and poly.centers |
Returns augmented polygon list with additional element – "ranges"
Markus Loecher <[email protected]> and Madhav Kumar <[email protected]>
data(sf.polys, envir = environment()) sf.polys <- AddRanges(sf.polys) str(sf.polys$ranges)
data(sf.polys, envir = environment()) sf.polys <- AddRanges(sf.polys) str(sf.polys$ranges)
Object of class SpatialPolygonsDataFrame containing spatial polygons of Census tracts in California. The object has been originally created from the 2010 US Census tiger/line boundary files (http://www.census.gov/geo/www/tiger/) for Census Tracts. The polygons have been manually cropped to the area in and around San Francisco.
data(california.tract10)
data(california.tract10)
An object of class SpatialPolygonsDataFrame from the sp package
data
data frame containing information for 457 variables (excluding ids) available from the summary file 1
polygons
polygons of Census Tracts
plotOrder
plotting order of polygons
bbox
bounding box of spatial polygons
proj4string
projection of polygons. All polygons are projected in CRS(" +proj=longlat +ellps=GRS80 +datum=NAD83 +no_defs +towgs84=0,0,0")
For details on the summary variables present in the data frame please refer
http://www.census.gov/prod/cen2000/doc/sf1.pdf
http://cran.r-project.org/web/packages/UScensus2010/index.html
Zack W. Almquist (2010). US Census Spatial and Demographic Data in R: The UScensus2000 Suite of Packages. Journal of Statistical Software, 37(6), 1-31. URL http://www.jstatsoft.org/v37/i06/ http://www.census.gov/prod/cen2000/doc/sf1.pdf
data(california.tract10, envir = environment()) plot(california.tract10)
data(california.tract10, envir = environment()) plot(california.tract10)
This function serves three purposes: (i) changes the (complicated) data structure of a spatial polygon (from the sp package) to a format which is aligned with the (simpler) PBSmapping polygon format. (ii) clips/crops the polygons to a pre specified bounding box (iii) computes and adds the polygon centers for each polygon
CropSpatialPolygonsDataFrame(x, bb = NULL, verbose = 0)
CropSpatialPolygonsDataFrame(x, bb = NULL, verbose = 0)
x |
object of class SpatialPolygonsDataFrame |
bb |
bounding box to crop the polygons |
verbose |
level of verbosity |
New list with separate entries for data, polys, and poly centers
Markus Loecher <[email protected]> and Madhav Kumar <[email protected]>
# San Francisco: data(california.tract10, envir = environment()) sf.polys <- CropSpatialPolygonsDataFrame(x= california.tract10, bb= data.frame(X=c(-122.5132, -122.37), Y= c(37.70760, 37.81849)))
# San Francisco: data(california.tract10, envir = environment()) sf.polys <- CropSpatialPolygonsDataFrame(x= california.tract10, bb= data.frame(X=c(-122.5132, -122.37), Y= c(37.70760, 37.81849)))
This functions plots the points that could not be mapped using RapidPolygonLookup() The points are overlayed on the polygons to contextualize their geographical location and understand the reason behind their exclusion.
DiagnoseFailure(XY.polys, poly.list = NULL)
DiagnoseFailure(XY.polys, poly.list = NULL)
XY.polys |
output from function RapidPolygonLookup() |
poly.list |
polygon list with 3 or 4 elements: data, polys, poly.centers, and possibly ranges. Needs to be supplied if RapidPolygonLookup() was run with keep.data= FALSE |
Markus Loecher <[email protected]> and Madhav Kumar <[email protected]>
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) cat(nrow(sf.crime.2012), "rows in SF crime \n") XY.kdtree <- RapidPolygonLookup(sf.crime.2012[,c("X","Y")], poly.list= sf.polys, k= 10, N= 1000, poly.id= "fips", poly.id.colname= "census.block", keep.data= TRUE, verbose= TRUE) DiagnoseFailure(XY.kdtree)
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) cat(nrow(sf.crime.2012), "rows in SF crime \n") XY.kdtree <- RapidPolygonLookup(sf.crime.2012[,c("X","Y")], poly.list= sf.polys, k= 10, N= 1000, poly.id= "fips", poly.id.colname= "census.block", keep.data= TRUE, verbose= TRUE) DiagnoseFailure(XY.kdtree)
This function searches the lat-long ranges of polygons to come up with a shorter list of candidates on which point.in.polygon() from the sp package can be applied.
FindPolygonInRanges(poly.list, XY, poly.id = "fips", poly.id.colname = "census.block", verbose = 0)
FindPolygonInRanges(poly.list, XY, poly.id = "fips", poly.id.colname = "census.block", verbose = 0)
poly.list |
polygon list with 3 or 4 elements: data, polys, poly.centers, and possibly ranges |
XY |
data frame containing X-Y columns |
poly.id |
column name in 'poly.list$data' containing the polygon identifier |
poly.id.colname |
desired column name in the output data frame containing the polygon identifier |
verbose |
level of verbosity |
Markus Loecher <[email protected]> and Madhav Kumar <[email protected]>
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) sf.polys <- AddRanges(sf.polys) XY <- FindPolygonInRanges(sf.polys, sf.crime.2012[1:1000,], verbose=0) which(is.na(XY[,"census.block"])) table(XY$rank)
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) sf.polys <- AddRanges(sf.polys) XY <- FindPolygonInRanges(sf.polys, sf.crime.2012[1:1000,], verbose=0) which(is.na(XY[,"census.block"])) table(XY$rank)
Given spatial partitions such as census blocks, ZIP codes or police district boundaries, we are frequently faced with the need to spatially aggregate data. Unless efficient data structures are used, this can be a daunting task. The operation point.in.polygon() from the package sp is computationally expensive. Here, we exploit kd-trees as efficient nearest neighbor search algorithm to dramatically reduce the effective number of polygons being searched. Points that are left unmapped are put through a linear search to find the associated polygon.
RapidPolygonLookup(XY, polygons, poly.list = NULL, k = 10, N = nrow(XY), poly.id = "fips", poly.id.colname = "census.block", keep.data = TRUE, verbose = 0)
RapidPolygonLookup(XY, polygons, poly.list = NULL, k = 10, N = nrow(XY), poly.id = "fips", poly.id.colname = "census.block", keep.data = TRUE, verbose = 0)
XY |
data frame containing X-Y or (lon-lat, long-lat, longitude-latitude) columns |
polygons |
polygons to crop and add poly centres |
poly.list |
polygon list with three elements: data, polys, and poly.centers as output from function CropSpatialPolygonsDataFrame() |
k |
maximum number of near neighbours to compute. The default value is set to 10 |
N |
number of rows of XY to search |
poly.id |
column name in 'poly.list$data' containing the polygon identifier |
poly.id.colname |
desired column name in the output data frame containing the polygon identifier |
keep.data |
retain polygon list and centers for future referece |
verbose |
level of verbosity |
The original points augmented with polygon ID are returned along with the poly centers and other call information
Markus Loecher <[email protected]> and Madhav Kumar <[email protected]>
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) cat(nrow(sf.crime.2012), "rows in SF crime \n") XY.kdtree <- RapidPolygonLookup(sf.crime.2012[,c("X","Y")], poly.list= sf.polys, k= 10, N= 1000, poly.id= "fips", poly.id.colname= "census.block", keep.data= TRUE, verbose= TRUE) XY.kdtree.DF <- XY.kdtree$XY table(XY.kdtree.DF$rank, useNA= "always") hist(XY.kdtree.DF$rank, xlab = "rank of neighbor")
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) cat(nrow(sf.crime.2012), "rows in SF crime \n") XY.kdtree <- RapidPolygonLookup(sf.crime.2012[,c("X","Y")], poly.list= sf.polys, k= 10, N= 1000, poly.id= "fips", poly.id.colname= "census.block", keep.data= TRUE, verbose= TRUE) XY.kdtree.DF <- XY.kdtree$XY table(XY.kdtree.DF$rank, useNA= "always") hist(XY.kdtree.DF$rank, xlab = "rank of neighbor")
This function uses the nn2() function from the RANN package to come up with a shorter list of candidates on which point.in.polygon() from the sp package can be applied.
SearchForPolygon(poly.list, XY, k, poly.id, poly.id.colname, verbose = 0)
SearchForPolygon(poly.list, XY, k, poly.id, poly.id.colname, verbose = 0)
poly.list |
polygon list with 3-4 elements: poly.centers, data, polys and possibly ranges |
XY |
data frame containing X-Y columns to assign polygons to |
k |
maximum number of nearest neighbours to compute. The default value is set to 10. |
poly.id |
column name in 'poly.list$data' containing the polygon identifier |
poly.id.colname |
desired column name in the output data frame containing the polygon identifier |
verbose |
level of verbosity |
Returns data frame with identified polygon and nearest neighbour rank
Markus Loecher <[email protected]> and Madhav Kumar <[email protected]>
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) XY.polys <- SearchForPolygon(poly.list= sf.polys, XY= sf.crime.2012[1:1000,], k= 10, poly.id= "fips", poly.id.colname= "census.block", verbose= TRUE)
data(sf.crime.2012, envir = environment()) data(sf.polys, envir = environment()) XY.polys <- SearchForPolygon(poly.list= sf.polys, XY= sf.crime.2012[1:1000,], k= 10, poly.id= "fips", poly.id.colname= "census.block", verbose= TRUE)
2012 crime incident data from the city of San Francisco
data(sf.crime.2012)
data(sf.crime.2012)
A data frame with 20,000 randomly selected observations with the following variables and their types:
Date
character
X
numeric
Y
numeric
violent
Factor
There are no more details required
https://data.sfgov.org/Public-Safety/SFPD-Reported-Incidents-2003-to-Present/dyj4-n68b
data(sf.crime.2012, envir = environment())
data(sf.crime.2012, envir = environment())
Cropped spatial polygons from California Census tracts bounded between San Francisco limits
data(sf.polys)
data(sf.polys)
A list object with the following elements:
data
data frame retained from California tracts object of class SpatialPolygonsDataFrame
polys
PolySet object from PBSmapping containing the spatial polygons
poly.centers
PolyData object from PBSmapping containing the polygon centroids
This object is created from a function of CropSpatialPolygonsDataFrame() from the RapidPolygonLookup package
http://cran.r-project.org/web/packages/UScensus2010/index.html
Zack W. Almquist (2010). US Census Spatial and Demographic Data in R: The UScensus2000 Suite of Packages. Journal of Statistical Software, 37(6), 1-31. URL http://www.jstatsoft.org/v37/i06/
data(sf.polys, envir = environment()) plotPolys(sf.polys$polys)
data(sf.polys, envir = environment()) plotPolys(sf.polys$polys)