Distance utilities API

ChoiceModels also includes tools for constructing pairwise distance matrices and calculating which geographies are within various distance bands of some reference geography.

Distance matrices

choicemodels.tools.great_circle_distance_matrix(df, x, y, earth_radius=6371009, return_int=True)[source]

Calculate a pairwise great-circle distance matrix from a DataFrame of points. Distances returned are in units of earth_radius (default is meters).

Parameters
  • df (pandas DataFrame) – a DataFrame of points, uniquely indexed by place identifier (e.g., tract ID or parcel ID), represented by x and y coordinate columns

  • x (str) – label of the x coordinate column in the DataFrame

  • y (str) – label of the y coordinate column in the DataFrame

  • earth_radius (numeric) – radius of earth in units in which distance will be returned (default is meters)

  • return_int (bool) – if True, convert all distances to integers

Returns

Multi-indexed distance vector in units of df’s values, with top-level index representing “from” and second-level index representing “to”.

Return type

pandas Series

choicemodels.tools.euclidean_distance_matrix(df)[source]

Calculate a pairwise euclidean distance matrix from a DataFrame of points. Distances returned are in units of x and y columns.

Parameters

df (pandas DataFrame) – a DataFrame of points, uniquely indexed by place identifier (e.g., tract ID or parcel ID), represented by x and y coordinate columns

Returns

Multi-indexed distance vector in units of df’s values, with top-level index representing “from” and second-level index representing “to”.

Return type

pandas Series

choicemodels.tools.distance_matrix(df, method='euclidean', x='lng', y='lat', earth_radius=6371009, return_int=True)[source]

Calculate a pairwise distance matrix from a DataFrame of two-dimensional points.

Parameters
  • df (pandas DataFrame) – a DataFrame of points, uniquely indexed by place identifier (e.g., tract ID or parcel ID), represented by x and y coordinate columns

  • method (str) – {‘euclidean’, ‘greatcircle’, ‘network’} which algorithm to use for calculating pairwise distances

  • x (str) – if method=’greatcircle’ or ‘network’, label of the x coordinate column in the DataFrame

  • y (str) – if method=’greatcircle’ or ‘network’, label of the y coordinate column in the DataFrame

  • earth_radius (numeric) – if method=’greatcircle’, radius of earth in units in which distance will be returned (default is meters)

  • return_int (bool) – if method=’greatcircle’, if True, convert all distances to integers

Returns

Multi-indexed distance vector in units of df’s values, with top-level index representing “from” and second-level index representing “to”.

Return type

pandas Series

Distance bands

choicemodels.tools.distance_bands(dist_vector, distances)[source]

Identify all geographies located within each distance band of each geography.

The list of distances is treated pairwise to create distance bands, with the first element of each pair forming the band’s inclusive lower limit and the second element of each pair forming the band’s exclusive upper limit. For example, if distances=[0, 10, 30], band 0 will contain all geographies with a distance >= 0 and < 10 units (e.g., meters) from the reference geography, and band 1 will contain all geographies with a distance >= 10 and < 30 units from the reference geography.

To make the final distance band include all geographies beyond a certain distance, make the final value in the distances list np.inf.

Parameters
  • dist_vector (pandas Series) – Multi-indexed distance vector in units of df’s values, with top-level index representing “from” and second-level index representing “to”.

  • distances (list) – a list of distance band increments

Returns

a series multi-indexed by geography ID and distance band number, with values of arrays of geography IDs with the corresponding distances from that ID

Return type

pandas Series