Download and Load Data

UrbanAccess has data downloaders and data loaders that can be used to acquire and load transit and street network datasets:

  1. Transit Network: General Transit Feed Specification (GTFS)
  1. Street Network: OpenStreetMap (OSM)

The GTFS Data Object

The GTFS data object stores the processed and aggregated GTFS feed data in Pandas DataFrames.

urbanaccess.gtfs.gtfsfeeds_dataframe.gtfsfeeds_dfs

A collection of dataframes representing the standardized and merged metropolitan-wide transit network from multiple GTFS feeds.

Parameters:

gtfsfeeds_dfs : object

processed dataframes of corresponding GTFS feed text files

gtfsfeeds_dfs.stops : pandas.DataFrame

gtfsfeeds_dfs.routes : pandas.DataFrame

gtfsfeeds_dfs.trips : pandas.DataFrame

gtfsfeeds_dfs.stop_times : pandas.DataFrame

gtfsfeeds_dfs.calendar : pandas.DataFrame

gtfsfeeds_dfs.calendar_dates : pandas.DataFrame

gtfsfeeds_dfs.stop_times_int : pandas.DataFrame

gtfsfeeds_dfs.headways : pandas.DataFrame

Downloading GTFS Data

Manage and download multiple feeds at once using the feeds object.

urbanaccess.gtfsfeeds.feeds

A dict of GTFS feeds as {name of GTFS feed or transit service/agency : URL of feed} to request and download in the gtfs downloader.

Parameters:

gtfs_feeds : dict

dictionary of the name of the transit service or agency GTFS feed as the key -note: this name will be used as the feed folder name. If the GTFS feed does not have a agency name in the agency.txt file this key will be used to name the agency- and the GTFS feed URL as the value to pass to the GTFS downloader as: {unique name of GTFS feed or transit service/agency : URL of feed}

The feeds object is an instance of the urbanaccess_gtfsfeeds class with the following functions:

class urbanaccess.gtfsfeeds.urbanaccess_gtfsfeeds(gtfs_feeds={})

A dict of GTFS feeds as {name of GTFS feed or transit service/agency : URL of feed} to request and download in the gtfs downloader.

Parameters:

gtfs_feeds : dict

dictionary of the name of the transit service or agency GTFS feed as the key -note: this name will be used as the feed folder name. If the GTFS feed does not have a agency name in the agency.txt file this key will be used to name the agency- and the GTFS feed URL as the value to pass to the GTFS downloader as: {unique name of GTFS feed or transit service/agency : URL of feed}

add_feed(add_dict, replace=False)

Add a dictionary to the urbanaccess_gtfsfeeds instance.

Parameters:

add_dict : dict

Dictionary to add to existing urbanaccess_gtfsfeeds with the name of the transit service or agency GTFS feed as the key and the GTFS feed URL as the value to pass to the GTFS downloader as: {unique name of GTFS feed or transit service/agency : URL of feed}

replace : bool, optional

If key of dict is already in the UrbanAccess replace the existing dict value with the value passed

classmethod from_yaml(gtfsfeeddir='data/gtfsfeeds', yamlname='gtfsfeeds.yaml')

Create a urbanaccess_gtfsfeeds instance from a saved YAML.

Parameters:

gtfsfeeddir : str, optional

Directory to load a YAML file.

yamlname : str or file like, optional

File name from which to load a YAML file.

Returns

——-

urbanaccess_gtfsfeeds

remove_feed(del_key=None, remove_all=False)

Remove GTFS feeds from the existing urbanaccess_gtfsfeeds instance

Parameters:

del_key : str or list, optional

dict keys as a single string or list of strings to remove from existing

remove_all : bool, optional

if true, remove all keys from existing urbanaccess_gtfsfeeds instance

to_dict()

Return a dict representation of an urbanaccess_gtfsfeeds instance.

to_yaml(gtfsfeeddir='data/gtfsfeeds', yamlname='gtfsfeeds.yaml', overwrite=False)

Save a urbanaccess_gtfsfeeds representation to a YAML file.

Parameters:

gtfsfeeddir : str, optional

Directory to save a YAML file.

yamlname : str or file like, optional

File name to which to save a YAML file.

overwrite : bool, optional

if true, overwrite an existing same name YAML file in specified directory

Returns

——-

Nothing

Search for feeds on the GTFS Data Exchange (Note the GTFS Data Exchange is no longer being maintained as of Summer 2016 so feeds here may be out of date).

urbanaccess.gtfsfeeds.search(api='gtfsdataexch', search_text=None, search_field=None, match='contains', add_feed=False, overwrite_feed=False)

Connect to a GTFS feed repository API and search for GTFS feeds that exist in a remote GTFS repository and whether or not to add the GTFS feed name and download URL to the urbanaccess_gtfsfeeds instance. Currently only supports access to the GTFS Data Exchange API.

Parameters:

api : {‘gtfsdataexch’}, optional

name of GTFS feed repository to search in. name corresponds to the dict specified in the urbanacess_config instance. Currently only supports access to the GTFS Data Exchange repository.

search_text : str, optional

string pattern to search for

search_field : string or list, optional

name of the field or column to search for string

match : {‘contains’, ‘exact’}, optional

search string matching method as either: contains or exact

add_feed : bool, optional

add search results to existing urbanaccess_gtfsfeeds instance using the name field as the key and the URL as the value

overwrite_feed : bool, optional

If true the existing urbanaccess_gtfsfeeds instance will be replaced with the records returned in the search results. All existing records will be removed.

Returns

——-

search_result_df : pandas.DataFrame

Dataframe of search results displaying full feed metadata

Download data from feeds in your feeds object or from custom feed URLs.

urbanaccess.gtfsfeeds.download(data_folder='data', feed_name=None, feed_url=None, feed_dict=None, error_pause_duration=5, delete_zips=False)

Connect to the URLs passed in function or the URLs stored in the urbanaccess_gtfsfeeds instance and download the GTFS feed zipfile(s) then unzip inside a local root directory. Resulting GTFS feed text files will be located in the root folder: gtfsfeed_text unless otherwise specified

Parameters:

data_folder : str, optional

directory to download GTFS feed data to

feed_name : str, optional

name of transit agency or service to use to name downloaded zipfile

feed_url : str, optional

corresponding URL to the feed_name to use to download GTFS feed zipfile

feed_dict : dict, optional

Dictionary specifying the name of the transit service or agency GTFS feed as the key and the GTFS feed URL as the value: {unique name of GTFS feed or transit service/agency : URL of feed}

error_pause_duration : int, optional

how long to pause in seconds before re-trying requests if error

delete_zips : bool, optional

if true the downloaded zipfiles will be removed

Returns

——-

nothing

Loading GTFS Data

Load raw GTFS data (from multiple feeds) into a UrbanAccess transit data object and run data through the validation and formatting sequence.

GTFS feeds are assumed to either be a single feed designated by the feed folder or multiple feeds designated as the root folder that holds all individual feed folders.

urbanaccess.gtfs.load.gtfsfeed_to_df(gtfsfeed_path=None, validation=False, verbose=True, bbox=None, remove_stops_outsidebbox=None, append_definitions=False)

Read all GTFS feed components as a dataframe in a gtfsfeeds_dfs object and merge all individual GTFS feeds into a regional metropolitan data table. Optionally, data can also be validated before its use.

Parameters:

gtfsfeed_path : str, optional

root path where all gtfs feeds that make up a contiguous metropolitan area are stored

validation : bool

if true, the validation check on stops checking for stops outside of a bounding box and stop coordinate hemisphere will be run. this is required to remove stops outside of a bbox

verbose : bool

if true and stops are found outside of the bbox, the stops that are outside will be printed for your reference

bbox : tuple

Bounding box formatted as a 4 element tuple: (lng_max, lat_min, lng_min, lat_max) example: (-122.304611,37.798933,-122.263412,37.822802) a bbox can be extracted for an area using: the CSV format bbox from http://boundingbox.klokantech.com/

remove_stops_outsidebbox : bool

if true stops that are outside the bbox will be removed

append_definitions : bool

if true, columns that use the GTFS data schema for their attribute codes will have the corresponding GTFS definition information of that code appended to the resulting dataframes for reference

Returns:

gtfsfeeds_dfs : object

processed dataframes of corresponding GTFS feed text files

gtfsfeeds_dfs.stops : pandas.DataFrame

gtfsfeeds_dfs.routes : pandas.DataFrame

gtfsfeeds_dfs.trips : pandas.DataFrame

gtfsfeeds_dfs.stop_times : pandas.DataFrame

gtfsfeeds_dfs.calendar : pandas.DataFrame

gtfsfeeds_dfs.calendar_dates : pandas.DataFrame

Computing route-stop level headways is optional but required if you wish to use headways in your network integration step here.

urbanaccess.gtfs.headways.headways(gtfsfeeds_df, headway_timerange)

Calculate headways by route stop for a specific time range

Parameters:

gtfsfeeds_df : object

gtfsfeeds_dfs object with all processed GTFS data tables

headway_timerange : list

time range for which to calculate headways between as a list of time 1 and time 2 where times are 24 hour clock strings such as: [‘07:00:00’, ‘10:00:00’]

Returns:

gtfsfeeds_dfs.headways : pandas.DataFrame

gtfsfeeds_dfs object for the headways dataframe with statistics of route stop headways in units of minutes with relevant route and stop information

Saving Processed GTFS data

urbanaccess.gtfs.network.save_processed_gtfs_data(gtfsfeeds_dfs, filename, dir='data')

Write dataframes in a gtfsfeeds_dfs object to a hdf5 file

Parameters:

gtfsfeeds_dfs : object

gtfsfeeds_dfs object

filename : string

name of the hdf5 file to save with .h5 extension

dir : string, optional

directory to save hdf5 file

Returns:

None

Loading Processed GTFS data

urbanaccess.gtfs.network.load_processed_gtfs_data(filename, dir='data')

Read data from a hdf5 file to a gtfsfeeds_dfs object

Parameters:

filename : string

name of the hdf5 file to read with .h5 extension

dir : string, optional

directory to read hdf5 file

Returns:

gtfsfeeds_dfs : object

Downloading and Loading OpenStreetMap Data

Download OSM street network nodes and edges.

urbanaccess.osm.load.ua_network_from_bbox(lat_min=None, lng_min=None, lat_max=None, lng_max=None, bbox=None, network_type='walk', timeout=180, memory=None, max_query_area_size=2500000000, remove_lcn=True)

Make a graph network (nodes and edges) from a bounding lat/lon box that is compatible with the network analysis tool Pandana

Parameters:

lat_min : float

southern latitude of bounding box

lng_min : float

eastern longitude of bounding box

lat_max : float

northern latitude of bounding box

lng_max : float

western longitude of bounding box

bbox : tuple

Bounding box formatted as a 4 element tuple: (lng_max, lat_min, lng_min, lat_max) example: (-122.304611,37.798933,-122.263412,37.822802) a bbox can be extracted for an area using: the CSV format bbox from http://boundingbox.klokantech.com/

network_type : {‘walk’, ‘drive’}, optional

Specify the network type where value of ‘walk’ includes roadways where pedestrians are allowed and pedestrian pathways and ‘drive’ includes driveable roadways. Default is walk.

timeout : int, optional

the timeout interval for requests and to pass to Overpass API

memory : int, optional

server memory allocation size for the query, in bytes. If none, server will use its default allocation size

max_query_area_size : float, optional

max area for any part of the geometry, in the units the geometry is in: any polygon bigger will get divided up for multiple queries to Overpass API (default is 50,000 * 50,000 units (ie, 50km x 50km in area, if units are meters))

remove_lcn : bool, optional

remove low connectivity nodes from the resulting network. this ensures the resulting network does not have nodes that are unconnected from the rest of the larger network

Returns:

nodesfinal, edgesfinal : pandas.DataFrame