Extending R's GTFS abilities with simple features
June 2, 2018 - 9 minutes
Two functions to easily transition from library(gtfsr) into library(sf) for tidyier transit analysis.
Civic Data sf gtfsr tidyverseTLDR
Two similar functions for building sf
objects of the routes and stops within a gtfs
objects.
General Transit Feed Specification
Originally named Google Transit Feed Specification, GTFS was developed in 2005 by Chris Harrelson (a Googler at the time), to make trip planning on civic transit systems easier.
GTFS format is a standardized set of comma separated txt
files, with specific names and fields. To be flexible (i.e. cover a lot of public transit options), not all files/fields are required. Here is a comprehensive list breakdown.
library(gtfsr)
The rOpenSci project’s library(gtfsr)
is designed to make reading and visualizing GTFS data fast and easy. You can install the package from GitHub, via devtools::install_github("ropensci/gtfsr")
.
gtfsr::import_gtfs()
This function does a great job handling the flexible GTFS structure. Here I’m reading in the local Charlottesville Area Transit’s (CAT) GTFS feed.
The zipped data is available in this Civic Innovation Day repo.
library(gtfsr)
## Error in library(gtfsr): there is no package called 'gtfsr'
cat <- gtfsr::import_gtfs("https://github.com/Smart-Cville/CID-2018-Regional-Transit-Challenge/blob/b6c714ec190f8843d6aa154fc74ed7be3bd5307f/data/2017_08_CharlottesvilleAreaTransit.zip?raw=true")
## Error in loadNamespace(name): there is no package called 'gtfsr'
# lots of verbose warnings that I turned off for kniting porpoises
The function returns to us a special S3 class
list object, gtfs
, that has a data frame, for each of the available files.
class(cat)
## [1] "function"
names(cat)
## NULL
gtfsr::map_gtfs()
A high-level mapping function built with library(leaflet)
.
In this specific case, attempting to plot all routes results in an error about missing the agency_id
field. So my work around here is to explicitly specify all routes, which is easy via %>%
dot.
library(magrittr)
cat %>% map_gtfs() # errors out
## Error in map_gtfs(.): could not find function "map_gtfs"
# work around
cat %>% map_gtfs(., route_ids = .$routes_df$route_id,
route_colors = paste0("#", .$routes_df$route_color)) %>%
# needed for web sharing only
widgetframe::frameWidget() # isolates leaflet:css, to prevent messing with webpage:css
## Error in map_gtfs(., route_ids = .$routes_df$route_id, route_colors = paste0("#", : could not find function "map_gtfs"
You can compare the library(gtfsr)
results against the current CAT app and static map.
Extending to sf
The gtfsr
package does a great job reading and showing data, but if you want to join in additional data sources or customize plots beyond colors or routes, you are out of luck.
Simple Features is making big waves in the r-spatial community since library(sf)
(https://github.com/r-spatial/sf) introduced a new class of geometry enriched data frames.
Extracting routes into routes_sf
Because GTFS data is distributed across multiple files, we must pick apart the list of data frames to pull out the critical pieces.
In order to reconstruct a plot similar to map_gtfs()
, we need:
- key-value pairs from
cat$trips_df
- route meta information from
cat$routes_df
- shapes as Lon/lat coordinates from
cat$shapes_df
- Caution: the following function is long *
library(sf)
library(tidyverse)
gtfs_routes_sf <- function(gtfs) {
## gather key-values first ----
# trips_df has route_id:shape_id
shape_key <- gtfs$trips_df %>%
select(route_id, shape_id) %>%
unique()
# routes_df has route_id:route_name
route_key <- gtfs$routes_df %>%
select(route_id, route_short_name) %>%
mutate(route_short_name = paste("route", route_short_name)) %>%
inner_join(shape_key)
# check for colors :)
if ( !is.null(gtfs$routes_df$route_color) ) { # extract if they exist
route_key %<>% inner_join(select(gtfs$routes_df, route_color, route_id) )
}
else { # planB: build a pal from my favorite pallette 'd3'
route_key %<>% mutate(route_color = rep(ggsci::pal_d3()(10),
length.out = nrow(route_key)))
}
## build the sf object ----
# exctract lon/lat values as matrix to build linestrings for each "shape_id"
sfc <- gtfs$shapes_df %>% # long data frame
split(.$shape_id) %>% # list of shorted data framee, one per shape
map(~ select(., shape_pt_lon, shape_pt_lat) %>% # order maters, lon-1st lat-2nd
as.matrix %>% # coherce for st_linestrings happiness
st_linestring) %>%
st_sfc(crs = 4326) # bundle all shapes into a collection
# add collection on and convert to sf
unique(gtfs$shapes_df$shape_id) %>%
sort() %>% # sort to match with names(sfc); split()'s factor-cohercion alpha sorts
st_sf(shape_id = ., geometry = sfc) %>%
inner_join(route_key)
# plots fine ^^
# st_sf(route_key, geometry = sfc)
# doesn't plot ^^ and I can't explain it
}
OK that a lot of code, but does it actually work?
cat_routes_sf <- gtfs_routes_sf(cat)
## Error in gtfs$trips_df: object of type 'closure' is not subsettable
head(cat_routes_sf)
## Error in head(cat_routes_sf): object 'cat_routes_sf' not found
Yea it does.
Now all we need to do is viz it and since library(leaflet)
already works out of the box with sf
objects, it’s cake.
library(leaflet)
leaflet(cat_routes_sf) %>%
addTiles() %>%
addPolylines(color = ~paste0("#", route_color),
label = ~as.character(route_short_name)) %>%
addLegend(colors = ~unique(paste0("#", route_color)),
labels = ~unique(route_short_name)) %>%
widgetframe::frameWidget()
## Error in structure(list(options = options), leafletData = data): object 'cat_routes_sf' not found
Rinse and repeat for stops_sf()
The steps are slightly different but the pattern is the same (and it’s still long).
gtfs_stops_sf <- function(gtfs) {
shape_key <- gtfs$trips_df %>%
select(trip_id, route_id, shape_id) %>%
unique()
# stop_times_df also has stop sequence and arrive/depart time for specific stops
stop_key <- gtfs$stop_times_df %>%
select(trip_id, stop_id) %>%
unique() %>%
inner_join(shape_key) %>% # one stop is on multiple routes
# need to pair down
arrange(route_id) %>% # use route_id as tiebreaker (for now)
group_by(stop_id) %>% # group_by() to stop_id
slice(1) # to slice() out the first row
if ( !is.null(gtfs$routes_df$route_color) ) {
stop_key %<>% inner_join(select(gtfs$routes_df, route_color, route_id)) }
else {stop_key %<>% mutate(route_color = rep(ggsci::pal_d3()(10), length.out = nrow(route_key))) }
stops_sfc <- gtfs$stops_df %>%
split(.$stop_id) %>%
map(~select(., stop_lon, stop_lat) %>%
unlist() %>%
st_point() ) %>% # point instead of linestring
st_sfc()
st_sf(stop_key, geometry = stops_sfc) %>%
inner_join(gtfs$stops_df)
}
Now, make use and make stops_sf
object.
cat_stops_sf <- gtfs_stops_sf(cat)
## Error in gtfs$trips_df: object of type 'closure' is not subsettable
head(cat_stops_sf)
## Error in head(cat_stops_sf): object 'cat_stops_sf' not found
Now we just have to combine our two layers, routes and stops.
To make a map that looks a lot like to the result from map_gtfs()
, I’m gonna tweak a few of the defaults to addCircleMarkers()
.
leaflet(cat_routes_sf) %>%
addTiles() %>%
addPolylines(color = ~paste0("#", route_color),
label = ~htmltools::htmlEscape(route_short_name)) %>%
addCircleMarkers(data = cat_stops_sf, fillColor = ~paste0("#", route_color),
label = ~htmltools::htmlEscape(stop_name),
color = "black", radius = 5, weight = 3) %>%
addLegend(colors = ~unique(paste0("#", route_color)),
labels = ~unique(route_short_name)) %>%
widgetframe::frameWidget()
## Error in structure(list(options = options), leafletData = data): object 'cat_routes_sf' not found
👌
Wrap up
Now we can build sf
objects from gtfs
objects at-will!
That gives us new plotting flexibility and makes incorporating new data more straight forward.
Looks like exploring transit patterns in R just got a little easier! Now if only all transit data was already cleaned into GTFS …
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 3.6.2 (2019-12-12)
## os macOS Catalina 10.15.5
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/New_York
## date 2020-06-10
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## ! package * version date lib source
## P assertthat 0.2.1 2019-03-21 [?] CRAN (R 3.6.0)
## P backports 1.1.5 2019-10-02 [?] CRAN (R 3.6.0)
## P blogdown 0.17 2019-11-13 [?] CRAN (R 3.6.0)
## P bookdown 0.15 2019-11-12 [?] CRAN (R 3.6.0)
## P broom 0.5.2 2019-04-07 [?] CRAN (R 3.6.0)
## P callr 3.3.2 2019-09-22 [?] CRAN (R 3.6.0)
## P cellranger 1.1.0 2016-07-27 [?] CRAN (R 3.6.0)
## P class 7.3-15 2019-01-01 [?] CRAN (R 3.6.1)
## P classInt 0.4-2 2019-10-17 [?] CRAN (R 3.6.0)
## cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
## P colorspace 1.4-1 2019-03-18 [?] CRAN (R 3.6.0)
## P crayon 1.3.4 2017-09-16 [?] CRAN (R 3.6.0)
## P crosstalk 1.0.0 2016-12-21 [?] CRAN (R 3.6.0)
## P DBI 1.0.0 2018-05-02 [?] CRAN (R 3.6.0)
## P desc 1.2.0 2018-05-01 [?] CRAN (R 3.6.0)
## P devtools 2.2.1 2019-09-24 [?] CRAN (R 3.6.0)
## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
## dplyr * 1.0.0 2020-05-29 [1] CRAN (R 3.6.2)
## P e1071 1.7-2 2019-06-05 [?] CRAN (R 3.6.0)
## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.2)
## P evaluate 0.14 2019-05-28 [?] CRAN (R 3.6.0)
## fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
## P fastmap 1.0.1 2019-10-08 [?] CRAN (R 3.6.0)
## P forcats * 0.4.0 2019-02-17 [?] CRAN (R 3.6.0)
## P fs 1.3.1 2019-05-06 [?] CRAN (R 3.6.0)
## P generics 0.0.2 2018-11-29 [?] CRAN (R 3.6.0)
## P ggplot2 * 3.3.1 2020-05-28 [?] CRAN (R 3.6.2)
## glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.2)
## P gtable 0.3.0 2019-03-25 [?] CRAN (R 3.6.0)
## P haven 2.2.0 2019-11-08 [?] CRAN (R 3.6.0)
## P hms 0.5.2 2019-10-30 [?] CRAN (R 3.6.0)
## P htmltools 0.4.0 2019-10-04 [?] CRAN (R 3.6.0)
## P htmlwidgets 1.5.1 2019-10-08 [?] CRAN (R 3.6.0)
## httpuv 1.5.4 2020-06-06 [1] CRAN (R 3.6.2)
## P httr 1.4.1 2019-08-05 [?] CRAN (R 3.6.0)
## jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.0)
## P KernSmooth 2.23-16 2019-10-15 [?] CRAN (R 3.6.0)
## P knitr 1.26 2019-11-12 [?] CRAN (R 3.6.0)
## later 1.1.0.1 2020-06-05 [1] CRAN (R 3.6.2)
## P lattice 0.20-38 2018-11-04 [?] CRAN (R 3.6.1)
## P leaflet * 2.0.3 2019-11-16 [?] CRAN (R 3.6.0)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
## P lubridate 1.7.4 2018-04-11 [?] CRAN (R 3.6.0)
## P magrittr * 1.5 2014-11-22 [?] CRAN (R 3.6.0)
## P memoise 1.1.0 2017-04-21 [?] CRAN (R 3.6.0)
## mime 0.9 2020-02-04 [1] CRAN (R 3.6.0)
## P modelr 0.1.5 2019-08-08 [?] CRAN (R 3.6.0)
## P munsell 0.5.0 2018-06-12 [?] CRAN (R 3.6.0)
## P nlme 3.1-142 2019-11-07 [?] CRAN (R 3.6.0)
## pillar 1.4.4 2020-05-05 [1] CRAN (R 3.6.2)
## P pkgbuild 1.0.6 2019-10-09 [?] CRAN (R 3.6.0)
## P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 3.6.0)
## P pkgload 1.0.2 2018-10-29 [?] CRAN (R 3.6.0)
## P prettyunits 1.0.2 2015-07-13 [?] CRAN (R 3.6.0)
## P processx 3.4.1 2019-07-18 [?] CRAN (R 3.6.0)
## P promises 1.1.0 2019-10-04 [?] CRAN (R 3.6.0)
## P ps 1.3.0 2018-12-21 [?] CRAN (R 3.6.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 3.6.2)
## P R6 2.4.1 2019-11-12 [?] CRAN (R 3.6.0)
## P Rcpp 1.0.3 2019-11-08 [?] CRAN (R 3.6.0)
## P readr * 1.3.1 2018-12-21 [?] CRAN (R 3.6.0)
## P readxl 1.3.1 2019-03-13 [?] CRAN (R 3.6.0)
## P remotes 2.1.0 2019-06-24 [?] CRAN (R 3.6.0)
## renv 0.8.3 2019-11-11 [1] CRAN (R 3.6.1)
## P rlang 0.4.6 2020-05-02 [?] CRAN (R 3.6.2)
## P rmarkdown 1.17 2019-11-13 [?] CRAN (R 3.6.0)
## P rprojroot 1.3-2 2018-01-03 [?] CRAN (R 3.6.0)
## P rstudioapi 0.10 2019-03-19 [?] CRAN (R 3.6.0)
## P rvest 0.3.5 2019-11-08 [?] CRAN (R 3.6.0)
## scales 1.1.1 2020-05-11 [1] CRAN (R 3.6.2)
## P sessioninfo 1.1.1 2018-11-05 [?] CRAN (R 3.6.0)
## P sf * 0.9-3 2020-05-04 [?] CRAN (R 3.6.2)
## shiny 1.4.0.2 2020-03-13 [1] CRAN (R 3.6.0)
## P stringi 1.4.3 2019-03-12 [?] CRAN (R 3.6.0)
## P stringr * 1.4.0 2019-02-10 [?] CRAN (R 3.6.0)
## P testthat 2.3.0 2019-11-05 [?] CRAN (R 3.6.0)
## tibble * 3.0.1 2020-04-20 [1] CRAN (R 3.6.2)
## P tidyr * 1.0.0 2019-09-11 [?] CRAN (R 3.6.0)
## tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.2)
## P tidyverse * 1.2.1 2017-11-14 [?] CRAN (R 3.6.0)
## P units 0.6-5 2019-10-08 [?] CRAN (R 3.6.0)
## P usethis 1.5.1 2019-07-04 [?] CRAN (R 3.6.0)
## P vctrs 0.3.1 2020-06-05 [?] CRAN (R 3.6.2)
## P withr 2.1.2 2018-03-15 [?] CRAN (R 3.6.0)
## P xfun 0.11 2019-11-12 [?] CRAN (R 3.6.0)
## P xml2 1.2.2 2019-08-09 [?] CRAN (R 3.6.0)
## P xtable 1.8-4 2019-04-21 [?] CRAN (R 3.6.0)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
##
## [1] /Users/nathanday/ROS/new_site/renv/library/R-3.6/x86_64-apple-darwin15.6.0
## [2] /private/var/folders/7_/cvjz84n54vx7zv_pw3gmdqr00000gn/T/Rtmp7Jgqh8/renv-system-library
##
## P ── Loaded and on-disk path mismatch.