Download and Load Data¶
UrbanAccess has data downloaders and data loaders that can be used to acquire and load transit and street network datasets:
Transit Network: General Transit Feed Specification (GTFS)
Street Network: OpenStreetMap (OSM)
The GTFS Data Object¶
The GTFS data object stores the processed and aggregated GTFS feed data in Pandas DataFrames.
-
urbanaccess.gtfs.gtfsfeeds_dataframe.
gtfsfeeds_dfs
¶ A collection of dataframes representing the standardized and merged metropolitan-wide transit network from multiple GTFS feeds.
- Parameters
- gtfsfeeds_dfsobject
processed dataframes of corresponding GTFS feed text files
- gtfsfeeds_dfs.stopspandas.DataFrame
- gtfsfeeds_dfs.routespandas.DataFrame
- gtfsfeeds_dfs.tripspandas.DataFrame
- gtfsfeeds_dfs.stop_timespandas.DataFrame
- gtfsfeeds_dfs.calendarpandas.DataFrame
- gtfsfeeds_dfs.calendar_datespandas.DataFrame
- gtfsfeeds_dfs.stop_times_intpandas.DataFrame
- gtfsfeeds_dfs.headwayspandas.DataFrame
Downloading GTFS Data¶
Manage and download multiple feeds at once using the feeds
object.
-
urbanaccess.gtfsfeeds.
feeds
¶ A dict of GTFS feeds as {name of GTFS feed or transit service/agency : URL of feed} to request and download in the gtfs downloader.
- Parameters
- gtfs_feedsdict
dictionary of the name of the transit service or agency GTFS feed as the key -note: this name will be used as the feed folder name. If the GTFS feed does not have a agency name in the agency.txt file this key will be used to name the agency- and the GTFS feed URL as the value to pass to the GTFS downloader as: {unique name of GTFS feed or transit service/agency : URL of feed}
The feeds
object is an instance of the urbanaccess_gtfsfeeds
class with the following functions:
-
class
urbanaccess.gtfsfeeds.
urbanaccess_gtfsfeeds
(gtfs_feeds={})¶ A dict of GTFS feeds as {name of GTFS feed or transit service/agency : URL of feed} to request and download in the gtfs downloader.
- Parameters
- gtfs_feedsdict
dictionary of the name of the transit service or agency GTFS feed as the key -note: this name will be used as the feed folder name. If the GTFS feed does not have a agency name in the agency.txt file this key will be used to name the agency- and the GTFS feed URL as the value to pass to the GTFS downloader as: {unique name of GTFS feed or transit service/agency : URL of feed}
-
add_feed
(add_dict, replace=False)¶ Add a dictionary to the urbanaccess_gtfsfeeds instance.
- Parameters
- add_dictdict
Dictionary to add to existing urbanaccess_gtfsfeeds with the name of the transit service or agency GTFS feed as the key and the GTFS feed URL as the value to pass to the GTFS downloader as: {unique name of GTFS feed or transit service/agency : URL of feed}
- replacebool, optional
If key of dict is already in the UrbanAccess replace the existing dict value with the value passed
-
classmethod
from_yaml
(gtfsfeeddir='data/gtfsfeeds', yamlname='gtfsfeeds.yaml')¶ Create a urbanaccess_gtfsfeeds instance from a saved YAML.
- Parameters
- gtfsfeeddirstr, optional
Directory to load a YAML file.
- yamlnamestr or file like, optional
File name from which to load a YAML file.
- Returns
- ——-
- urbanaccess_gtfsfeeds
-
remove_feed
(del_key=None, remove_all=False)¶ Remove GTFS feeds from the existing urbanaccess_gtfsfeeds instance
- Parameters
- del_keystr or list, optional
dict keys as a single string or list of strings to remove from existing
- remove_allbool, optional
if true, remove all keys from existing urbanaccess_gtfsfeeds instance
-
to_dict
()¶ Return a dict representation of an urbanaccess_gtfsfeeds instance.
-
to_yaml
(gtfsfeeddir='data/gtfsfeeds', yamlname='gtfsfeeds.yaml', overwrite=False)¶ Save a urbanaccess_gtfsfeeds representation to a YAML file.
- Parameters
- gtfsfeeddirstr, optional
Directory to save a YAML file.
- yamlnamestr or file like, optional
File name to which to save a YAML file.
- overwritebool, optional
if true, overwrite an existing same name YAML file in specified directory
- Returns
- ——-
- Nothing
Search for feeds on the GTFS Data Exchange (Note the GTFS Data Exchange is no longer being maintained as of Summer 2016 so feeds here may be out of date).
-
urbanaccess.gtfsfeeds.
search
(api='gtfsdataexch', search_text=None, search_field=None, match='contains', add_feed=False, overwrite_feed=False)¶ Connect to a GTFS feed repository API and search for GTFS feeds that exist in a remote GTFS repository and whether or not to add the GTFS feed name and download URL to the urbanaccess_gtfsfeeds instance. Currently only supports access to the GTFS Data Exchange API.
- Parameters
- api{‘gtfsdataexch’}, optional
name of GTFS feed repository to search in. name corresponds to the dict specified in the urbanacess_config instance. Currently only supports access to the GTFS Data Exchange repository.
- search_textstr, optional
string pattern to search for
- search_fieldstring or list, optional
name of the field or column to search for string
- match{‘contains’, ‘exact’}, optional
search string matching method as either: contains or exact
- add_feedbool, optional
add search results to existing urbanaccess_gtfsfeeds instance using the name field as the key and the URL as the value
- overwrite_feedbool, optional
If true the existing urbanaccess_gtfsfeeds instance will be replaced with the records returned in the search results. All existing records will be removed.
- Returns
- ——-
- search_result_dfpandas.DataFrame
Dataframe of search results displaying full feed metadata
Download data from feeds in your feeds
object or from custom feed URLs.
-
urbanaccess.gtfsfeeds.
download
(data_folder='data', feed_name=None, feed_url=None, feed_dict=None, error_pause_duration=5, delete_zips=False)¶ Connect to the URLs passed in function or the URLs stored in the urbanaccess_gtfsfeeds instance and download the GTFS feed zipfile(s) then unzip inside a local root directory. Resulting GTFS feed text files will be located in the root folder: gtfsfeed_text unless otherwise specified
- Parameters
- data_folderstr, optional
directory to download GTFS feed data to
- feed_namestr, optional
name of transit agency or service to use to name downloaded zipfile
- feed_urlstr, optional
corresponding URL to the feed_name to use to download GTFS feed zipfile
- feed_dictdict, optional
Dictionary specifying the name of the transit service or agency GTFS feed as the key and the GTFS feed URL as the value: {unique name of GTFS feed or transit service/agency : URL of feed}
- error_pause_durationint, optional
how long to pause in seconds before re-trying requests if error
- delete_zipsbool, optional
if true the downloaded zipfiles will be removed
- Returns
- ——-
- nothing
Loading GTFS Data¶
Load raw GTFS data (from multiple feeds) into a UrbanAccess transit data object and run data through the validation and formatting sequence.
GTFS feeds are assumed to either be a single feed designated by the feed folder or multiple feeds designated as the root folder that holds all individual feed folders.
-
urbanaccess.gtfs.load.
gtfsfeed_to_df
(gtfsfeed_path=None, validation=False, verbose=True, bbox=None, remove_stops_outsidebbox=None, append_definitions=False)¶ Read all GTFS feed components as a dataframe in a gtfsfeeds_dfs object and merge all individual GTFS feeds into a regional metropolitan data table. Optionally, data can also be validated before its use.
- Parameters
- gtfsfeed_pathstr, optional
root path where all GTFS feeds that make up a contiguous metropolitan area are stored
- validationbool
if true, the validation check on stops checking for stops outside of a bounding box and stop coordinate hemisphere will be run. this is required to remove stops outside of a bbox
- verbosebool
if true and stops are found outside of the bbox, the stops that are outside will be printed for your reference
- bboxtuple
Bounding box formatted as a 4 element tuple: (lng_max, lat_min, lng_min, lat_max) example: (-122.304611,37.798933,-122.263412,37.822802) a bbox can be extracted for an area using: the CSV format bbox from http://boundingbox.klokantech.com/
- remove_stops_outsidebboxbool
if true stops that are outside the bbox will be removed
- append_definitionsbool
if true, columns that use the GTFS data schema for their attribute codes will have the corresponding GTFS definition information of that code appended to the resulting dataframes for reference
- Returns
- gtfsfeeds_dfsobject
processed dataframes of corresponding GTFS feed text files
- gtfsfeeds_dfs.stopspandas.DataFrame
- gtfsfeeds_dfs.routespandas.DataFrame
- gtfsfeeds_dfs.tripspandas.DataFrame
- gtfsfeeds_dfs.stop_timespandas.DataFrame
- gtfsfeeds_dfs.calendarpandas.DataFrame
- gtfsfeeds_dfs.calendar_datespandas.DataFrame
Computing route-stop level headways is optional but required if you wish to use headways in your network integration step here.
-
urbanaccess.gtfs.headways.
headways
(gtfsfeeds_df, headway_timerange)¶ Calculate headways by route stop for a specific time range
- Parameters
- gtfsfeeds_dfobject
gtfsfeeds_dfs object with all processed GTFS data tables
- headway_timerangelist
time range for which to calculate headways between as a list of time 1 and time 2 where times are 24 hour clock strings such as: [‘07:00:00’, ‘10:00:00’]
- Returns
- gtfsfeeds_dfs.headwayspandas.DataFrame
gtfsfeeds_dfs object for the headways dataframe with statistics of route stop headways in units of minutes with relevant route and stop information
Saving Processed GTFS data¶
-
urbanaccess.gtfs.network.
save_processed_gtfs_data
(gtfsfeeds_dfs, filename, dir='data')¶ Write dataframes in a gtfsfeeds_dfs object to a hdf5 file
- Parameters
- gtfsfeeds_dfsobject
gtfsfeeds_dfs object
- filenamestring
name of the hdf5 file to save with .h5 extension
- dirstring, optional
directory to save hdf5 file
- Returns
- None
Loading Processed GTFS data¶
-
urbanaccess.gtfs.network.
load_processed_gtfs_data
(filename, dir='data')¶ Read data from a hdf5 file to a gtfsfeeds_dfs object
- Parameters
- filenamestring
name of the hdf5 file to read with .h5 extension
- dirstring, optional
directory to read hdf5 file
- Returns
- gtfsfeeds_dfsobject
Downloading and Loading OpenStreetMap Data¶
Download OSM street network nodes and edges.
-
urbanaccess.osm.load.
ua_network_from_bbox
(lat_min=None, lng_min=None, lat_max=None, lng_max=None, bbox=None, network_type='walk', timeout=180, memory=None, max_query_area_size=2500000000, remove_lcn=True)¶ Make a graph network (nodes and edges) from a bounding lat/lon box that is compatible with the network analysis tool Pandana
- Parameters
- lat_minfloat
southern latitude of bounding box
- lng_minfloat
eastern longitude of bounding box
- lat_maxfloat
northern latitude of bounding box
- lng_maxfloat
western longitude of bounding box
- bboxtuple
Bounding box formatted as a 4 element tuple: (lng_max, lat_min, lng_min, lat_max) example: (-122.304611,37.798933,-122.263412,37.822802) a bbox can be extracted for an area using: the CSV format bbox from http://boundingbox.klokantech.com/
- network_type{‘walk’, ‘drive’}, optional
Specify the network type where value of ‘walk’ includes roadways where pedestrians are allowed and pedestrian pathways and ‘drive’ includes driveable roadways. Default is walk.
- timeoutint, optional
the timeout interval for requests and to pass to Overpass API
- memoryint, optional
server memory allocation size for the query, in bytes. If none, server will use its default allocation size
- max_query_area_sizefloat, optional
max area for any part of the geometry, in the units the geometry is in: any polygon bigger will get divided up for multiple queries to Overpass API (default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are meters))
- remove_lcnbool, optional
remove low connectivity nodes from the resulting network. this ensures the resulting network does not have nodes that are unconnected from the rest of the larger network
- Returns
- nodesfinal, edgesfinalpandas.DataFrame