This article is about working with spatial data ‘in the tidyverse’. By this we mean handling spatial datasets using functions (such as
filter()) and concepts (such as type stability) from R packages that are part of the metapackage tidyverse. You should already have an R installation set-up for spatial data analysis having read Chapter 2 of Geocomputation with R.
If not take a read there now. In any case the tidyverse can be installed from CRAN with the following command:
The tidyverse works with spatial data thanks to sf which is quite a recent package (first release in 2016). If you have not already got it, get sf with:
The we will also uses a dataset from the spData package, which can be installed with:
Software for ‘data science’ is evolving. As we saw in Chapter 1, R packages ggplot2 and dplyr have become immensely popular. Now they are a part of the tidyverse.
These packages use the ‘tidy data’ principles for consistency and speed of processing. According to the
vignette("tidy-data"), in tidy datasets:
Historically spatial R packages have not been compatible with the tidyverse. But this has changed with the release of sf and hard work by Edzer Pebesma and Hadley Wickham to make them work together. As described in Chapter 2, sf combines the functionality of three previous packages: sp, rgeos and rgdal. It has many other advantages, including:
The final advantage comes with a warning: watch out for pitfalls! That’s topic of this workshop, as illustrated by the following GIF: it’s easy to imagine spatial data analyses but, as Homer Simson discovered with his BBQ project, doing something complicated is easier said than done, especially when combining packages that have only recently started working together:
Now you know the context and have your R setup sorted, it’s time to begin the practical. Execute the following 2 commands to attach spatial data libraries:
## Linking to GEOS 3.7.0, GDAL 2.3.2, PROJ 5.2.0
## Loading required package: sp
Notice the messages: sf uses some C libraries behind the scenes. raster depends on the older sp package which sf replaces (confusing I know!).
The next step is to attach the tidyverse, which brings us onto the first pitfall.
Just loading the tidyverse reveals a pitfall of using spatial data with the tidyverse that affects the raster package in particular:
## ── Attaching packages ──────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5 ## ✔ tibble 2.0.0 ✔ dplyr 0.8.0.9000 ## ✔ tidyr 0.8.2 ✔ stringr 1.3.1 ## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ── ## ✖ tidyr::extract() masks raster::extract() ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ✖ dplyr::select() masks raster::select()
The first chunk of output shows that the tidyverse is attaching its packages. ✔ yes we want ggplot2, ✔ we want dplyr etc. But we also get less positive messages. ✖ Doh! there are many conflicts.
In the context of spatial data this may only be a problem if you use the raster package. The final ✖ shows that dplyr’s
select() function has boshed (technically speaking, masked) raster’s select function. This can cause issues. To avoid this pitfall we suggest using
raster::select() rather than just
select() when using this conflicted function name if you use raster and the tidyverse.
sp doesn’t play well with the tidyverse:
Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('SpatialPolygonsDataFrame', 'SpatialPolygons', 'Spatial')"
This pitfall is not specific to the tidyverse but is worth being aware of. Let’s create a buffer around London of 500 km:
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
What is going with the country miles away?
The issue is that some objects have multiple geometries:
##  MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON ##  MULTIPOLYGON MULTIPOLYGON ## 18 Levels: GEOMETRY POINT LINESTRING POLYGON ... TRIANGLE
Which have more than 1?
## near_lnd.name_long sapply.near_lnd.geom..length. ## 1 France 3 ## 2 Germany 1 ## 3 Luxembourg 1 ## 4 Belgium 1 ## 5 Netherlands 1 ## 6 Ireland 1 ## 7 United Kingdom 2
We can resolve this issue by casting them:
The previous code chunk shows how spatial subsetting works in base R, with
near_lnd = world_poly[lnd_buff, ].
You can also do tidy spatial subsetting:
But there are pitfalls
sparse = FALSEin the spatial predicate function)
##  "1" "2" "3" "4" "5" "6" "7"
##  "44" "122" "129" "130" "131" "134" "144"
Also affects non-spatial data - tidyverse/dplyr#366
## Simple feature collection with 1 feature and 10 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -7.572168 ymin: 49.96 xmax: 1.681531 ymax: 58.635 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## iso_a2 name_long continent region_un subregion type ## 1 GB United Kingdom Europe Europe Northern Europe Country ## area_km2 pop lifeExp gdpPercap geom ## 1 249986.4 64613160 81.30488 38251.79 MULTIPOLYGON (((-6.197885 5...
Base R equivalent:
##  FALSE
Row names usually don’t matter but they can bite
##  FALSE
##  "iso_a2" "name_long" "continent" "region_un" "subregion" ##  "type" "area_km2" "pop" "lifeExp" "gdpPercap" ##  "geom"
##  "NAME" "GSS_CODE" "HECTARES" "NONLD_AREA" "ONS_INNER" ##  "SUB_2009" "SUB_2006" "geometry"
Error: Column `iso_a2` must have a unique name
Error in .subset2(x, i, exact = exact) : attempt to select less than one element in get1index
But this does:
Raster data is not yet closely connected to the tidyverse, however:
as(my_vector, "Spatial")before working on raster-vector interactions