Model step template APIs¶
The following templates are included in the core package. ModelManager can also work with templates defined elsewhere, as long as they follow the specifications described in the design guidelines.
OLS Regression¶
-
class
urbansim_templates.models.
OLSRegressionStep
(tables=None, model_expression=None, filters=None, out_tables=None, out_column=None, out_transform=None, out_filters=None, name=None, tags=[])[source]¶ A class for building OLS (ordinary least squares) regression model steps. This extends TemplateStep, where some common functionality is defined. Estimation and simulation are handled by urbansim.models.RegressionModel().
Expected usage: - create a model object - specify some parameters - run the fit() method - iterate as needed
Then, for simulation: - specify some simulation parameters - use the run() method for interactive testing - use modelmanager.register() to save the model to Orca and disk - registered steps can be accessed via ModelManager and Orca
All parameters listed in the constructor can be set directly on the class object, at any time.
- Parameters
tables (str or list of str, optional) – Name(s) of Orca tables to draw data from. The first table is the primary one. Any additional tables need to have merge relationships (“broadcasts”) specified so that they can be merged unambiguously onto the first table. Among them, the tables must contain all variables used in the model expression and filters. The left-hand-side variable should be in the primary table. The tables parameter is required for fitting a model, but it does not have to be provided when the object is created.
model_expression (str, optional) – Patsy formula containing both the left- and right-hand sides of the model expression: http://patsy.readthedocs.io/en/latest/formulas.html This parameter is required for fitting a model, but it does not have to be provided when the object is created.
filters (str or list of str, optional) – Filters to apply to the data before fitting the model. These are passed to pd.DataFrame.query(). Filters are applied after any additional tables are merged onto the primary one. Replaces the fit_filters argument in UrbanSim.
out_tables (str or list of str, optional) – Name(s) of Orca tables to use for simulation. If not provided, the tables parameter will be used. Same guidance applies: the tables must be able to be merged unambiguously, and must include all columns used in the right-hand-side of the model expression and in the out_filters.
out_column (str, optional) – Name of the column to write predicted values to. If it does not already exist in the primary output table, it will be created. If not provided, the left-hand- side variable from the model expression will be used. Replaces the out_fname argument in UrbanSim.
out_transform (str, optional) – Element-wise transformation to apply to the predicted values, for example to reverse a transformation of the left-hand-side variable in the model expression. This should be provided as a string containing a function name. Supports anything from NumPy or Python’s built-in math library, for example ‘np.exp’ or ‘math.floor’. Replaces the ytransform argument in UrbanSim.
out_filters (str or list of str, optional) – Filters to apply to the data before simulation. If not provided, no filters will be applied. Replaces the predict_filters argument in UrbanSim.
name (str, optional) – Name of the model step, passed to ModelManager. If none is provided, a name is generated each time the fit() method runs.
tags (list of str, optional) – Tags, passed to ModelManager.
-
fit
()[source]¶ Fit the model; save and report results.
This currently uses the RegressionModel class from core UrbanSim. We save the model object for prediction and interactive use (model, with type urbansim.models.regression.RegressionModel).
For example, you can use this to get a latex version of the summary table using m.model.model_fit.summary().as_latex(). This may change in the future if we refactor the template to use StatsModels directly.
-
classmethod
from_dict
(d)[source]¶ Create an object instance from a saved dictionary representation.
- Parameters
d (dict) –
- Returns
- Return type
-
run
()[source]¶ Run the model step: calculate predicted values, transform them as specified, and use them to update a column.
The pre-transformation predicted values are saved to the class object for diagnostic use (predicted_values with type pd.Series). The post-transformation predicted values are written to Orca.
Binary Logit¶
-
class
urbansim_templates.models.
BinaryLogitStep
(tables=None, model_expression=None, filters=None, out_tables=None, out_column=None, out_filters=None, out_value_true=1, out_value_false=0, name=None, tags=[])[source]¶ A class for building binary logit model steps. This extends TemplateStep, where some common functionality is defined. Estimation is handled by Statsmodels and simulation is handled within this class.
Expected usage: - create a model object - specify some parameters - run the fit() method - iterate as needed
Then, for simulation: - specify some simulation parameters - use the run() method for interactive testing - use modelmanager.register() to save the model to Orca and disk - registered steps can be accessed via ModelManager and Orca
All parameters listed in the constructor can be set directly on the class object, at any time.
- Parameters
tables (str or list of str, optional) – Name(s) of Orca tables to draw data from. The first table is the primary one. Any additional tables need to have merge relationships (“broadcasts”) specified so that they can be merged unambiguously onto the first table. Among them, the tables must contain all variables used in the model expression and filters. The left-hand-side variable should be in the primary table. The tables parameter is required for fitting a model, but it does not have to be provided when the object is created.
model_expression (str, optional) – Patsy formula containing both the left- and right-hand sides of the model expression: http://patsy.readthedocs.io/en/latest/formulas.html This parameter is required for fitting a model, but it does not have to be provided when the object is created.
filters (str or list of str, optional) – Filters to apply to the data before fitting the model. These are passed to pd.DataFrame.query(). Filters are applied after any additional tables are merged onto the primary one. Replaces the fit_filters argument in UrbanSim.
out_tables (str or list of str, optional) – Name(s) of Orca tables to use for simulation. If not provided, the tables parameter will be used. Same guidance applies: the tables must be able to be merged unambiguously, and must include all columns used in the right-hand-side of the model expression and in the out_filters.
out_column (str, optional) –
Name of the column to write simulated choices to. If it does not already exist in the primary output table, it will be created. If not provided, the left-hand- side variable from the model expression will be used. Replaces the out_fname argument in UrbanSim.
# TO DO - auto-generation not yet working; column must exist in the primary table
out_filters (str or list of str, optional) – Filters to apply to the data before simulation. If not provided, no filters will be applied. Replaces the predict_filters argument in UrbanSim.
out_value_true (numeric or str, optional) – Value to save to the output column corresponding to an affirmative choice. Default is 1 (int). Use keyword ‘nothing’ to leave values unchanged.
out_value_false (numeric or str, optional) – Value to save to the output column corresponding to a negative choice. Default is 0 (int). Use keyword ‘nothing’ to leave values unchanged.
name (str, optional) – Name of the model step, passed to ModelManager. If none is provided, a name is generated each time the fit() method runs.
tags (list of str, optional) – Tags, passed to ModelManager.
-
fit
()[source]¶ Fit the model; save and report results. This currently uses the Statsmodels Logit class with default estimation settings. (It will shift to ChoiceModels once more infrastructure is in place.)
The fit() method can be run as many times as desired. Results will not be saved with Orca or ModelManager until the register() method is run.
- Parameters
None –
- Returns
- Return type
None
-
classmethod
from_dict
(d)[source]¶ Create an object instance from a saved dictionary representation.
- Parameters
d (dict) –
- Returns
- Return type
-
run
()[source]¶ Run the model step: calculate simulated choices and use them to update a column.
For binary logit, we calculate predicted probabilities and then perform a weighted random draw to determine the simulated binary outcomes. This is done directly from the fitted parameters, because we can’t conveniently regenerate a Statsmodels results object from a dictionary representation.
The predicted probabilities and simulated choices are saved to the class object for interactive use (probabilities and choices, with type pd.Series) but are not persisted in the dictionary representation of the model step.
- Parameters
None –
- Returns
- Return type
None
Small Multinomial Logit¶
-
class
urbansim_templates.models.
SmallMultinomialLogitStep
(tables=None, model_expression=None, model_labels=None, choice_column=None, initial_coefs=None, filters=None, out_tables=None, out_column=None, out_filters=None, name=None, tags=[])[source]¶ A class for building multinomial logit model steps where the number of alternatives is “small”. Estimation is handled by PyLogit via the ChoiceModels API. Simulation is handled by PyLogit (probabilities) and ChoiceModels (simulation draws).
Multinomial logit models can involve a range of different specification and estimation mechanics. For now these are separated into two templates. What’s the difference?
“Small” MNL: - data is in a single table (choosers) - each alternative can have a different model expression - all the alternatives are available to all choosers - estimation and simulation use the PyLogit engine (via ChoiceModels)
“Large” MNL: - data is in two tables (choosers and alternatives) - each alternative has the same model expression - N alternatives are sampled for each chooser - estimation and simulation use the ChoiceModels engine (formerly UrbanSim MNL)
TO DO: - Add support for specifying availability of alternatives - Add support for sampling weights - Add support for on-the-fly interaction calculations (e.g. distance)
- Parameters
tables (str or list of str, optional) – Name(s) of Orca tables to draw data from. The first table is the primary one. Any additional tables need to have merge relationships (“broadcasts”) specified so that they can be merged unambiguously onto the first table. Among them, the tables must contain all variables used in the model expression and filters. The index of the primary table should be a unique ID. The tables parameter is required for fitting a model, but it does not have to be provided when the object is created. Reserved column names: ‘_obs_id’, ‘_alt_id’, ‘_chosen’.
model_expression (OrderedDict, optional) – PyLogit model expression. This parameter is required for fitting a model, but it does not have to be provided when the object is created.
model_labels (OrderedDict, optional) – PyLogit model labels.
choice_column (str, optional) – Name of the column indicating observed choices, for model estimation. The column should contain integers matching the alternatives in the model expression. This parameter is required for fitting a model, but it does not have to be provided when the object is created.
initial_coefs (list of numerics, optional) – Starting values for the parameter estimation algorithm, passed to PyLogit. Length must be equal to the number of parameters being estimated. If this is not provided, zeros will be used.
filters (str or list of str, optional) – Filters to apply to the data before fitting the model. These are passed to pd.DataFrame.query(). Filters are applied after any additional tables are merged onto the primary one. Replaces the fit_filters argument in UrbanSim.
out_tables (str or list of str, optional) – Name(s) of Orca tables to use for simulation. If not provided, the tables parameter will be used. Same guidance applies: the tables must be able to be merged unambiguously, and must include all columns used in the model expression and in the out_filters.
out_column (str, optional) – Name of the column to write simulated choices to. If it does not already exist in the primary output table, it will be created. If not provided, the choice_column will be used. Replaces the out_fname argument in UrbanSim.
out_filters (str or list of str, optional) – Filters to apply to the data before simulation. If not provided, no filters will be applied. Replaces the predict_filters argument in UrbanSim.
name (str, optional) – Name of the model step, passed to ModelManager. If none is provided, a name is generated each time the fit() method runs.
tags (list of str, optional) – Tags, passed to ModelManager.
-
fit
()[source]¶ Fit the model; save and report results. This uses PyLogit via ChoiceModels.
The fit() method can be run as many times as desired. Results will not be saved with Orca or ModelManager until the register() method is run.
-
classmethod
from_dict
(d)[source]¶ Create an object instance from a saved dictionary representation.
- Parameters
d (dict) –
- Returns
- Return type
-
run
()[source]¶ Run the model step: calculate simulated choices and use them to update a column.
Alternatives that appear in the estimation data but not in the model expression will not be available for simulation.
Predicted probabilities come from PyLogit. Monte Carlo simulation of choices is performed directly. (This functionality will move to ChoiceModels.)
The predicted probabilities and simulated choices are saved to the class object for interactive use (probabilities with type pd.DataFrame, and choices with type pd.Series) but are not persisted in the dictionary representation of the model step.
Large Multinomial Logit¶
-
class
urbansim_templates.models.
LargeMultinomialLogitStep
(choosers=None, alternatives=None, model_expression=None, choice_column=None, chooser_filters=None, chooser_sample_size=None, alt_filters=None, alt_sample_size=None, out_choosers=None, out_alternatives=None, out_column=None, out_chooser_filters=None, out_alt_filters=None, constrained_choices=False, alt_capacity=None, chooser_size=None, max_iter=None, name=None, tags=[])[source]¶ Class for building standard multinomial logit model steps where alternatives are interchangeable and all have the same model expression. Supports random sampling of alternatives.
Estimation and simulation are performed using ChoiceModels.
- Parameters
choosers (str or list of str, optional) – Name(s) of Orca tables to draw choice scenario data from. The first table is the primary one. Any additional tables need to have merge relationships (“broadcasts”) specified so that they can be merged unambiguously onto the first table. The index of the primary table should be a unique ID. In this template, the ‘choosers’ and ‘alternatives’ parameters replace the ‘tables’ parameter. Both are required for fitting a model, but do not have to be provided when the object is created. Reserved column names: ‘chosen’.
alternatives (str or list of str, optional) – Name(s) of Orca tables containing data about alternatives. The first table is the primary one. Any additional tables need to have merge relationships (“broadcasts”) specified so that they can be merged unambiguously onto the first table. The index of the primary table should be a unique ID. In this template, the ‘choosers’ and ‘alternatives’ parameters replace the ‘tables’ parameter. Both are required for fitting a model, but do not have to be provided when the object is created. Reserved column names: ‘chosen’.
model_expression (str, optional) – Patsy-style right-hand-side model expression representing the utility of a single alternative. Passed to choicemodels.MultinomialLogit(). This parameter is required for fitting a model, but does not have to be provided when the object is created.
choice_column (str, optional) – Name of the column indicating observed choices, for model estimation. The column should contain integers matching the id of the primary alternatives table. This parameter is required for fitting a model, but it does not have to be provided when the object is created. Not required for simulation.
chooser_filters (str or list of str, optional) – Filters to apply to the chooser data before fitting the model. These are passed to pd.DataFrame.query(). Filters are applied after any additional tables are merged onto the primary one. Replaces the fit_filters argument in UrbanSim.
chooser_sample_size (int, optional) – Number of choosers to sample, for faster model fitting. Sampling is random and may vary between model runs.
alt_filters (str or list of str, optional) – Filters to apply to the alternatives data before fitting the model. These are passed to pd.DataFrame.query(). Filters are applied after any additional tables are merged onto the primary one. Replaces the fit_filters argument in UrbanSim. Choosers whose chosen alternative is removed by these filters will not be included in the model estimation.
alt_sample_size (int, optional) – Numer of alternatives to sample for each choice scenario. For now, only random sampling is supported. If this parameter is not provided, we will use a sample size of one less than the total number of alternatives. (ChoiceModels codebase currently requires sampling.) The same sample size is used for estimation and prediction.
out_choosers (str or list of str, optional) – Name(s) of Orca tables to draw choice scenario data from, for simulation. If not provided, the choosers parameter will be used. Same guidance applies. Reserved column names: ‘chosen’, ‘join_index’, ‘observation_id’.
out_alternatives (str or list of str, optional) – Name(s) of Orca tables containing data about alternatives, for simulation. If not provided, the alternatives parameter will be used. Same guidance applies. Reserved column names: ‘chosen’, ‘join_index’, ‘observation_id’.
out_column (str, optional) – Name of the column to write simulated choices to. If it does not already exist in the primary out_choosers table, it will be created. If not provided, the choice_column will be used. If the column already exists, choices will be cast to match its data type. If the column is generated on the fly, it will be given the same data type as the index of the alternatives table. Replaces the out_fname argument in UrbanSim.
out_chooser_filters (str or list of str, optional) – Filters to apply to the chooser data before simulation. If not provided, no filters will be applied. Replaces the predict_filters argument in UrbanSim.
out_alt_filters (str or list of str, optional) – Filters to apply to the alternatives data before simulation. If not provided, no filters will be applied. Replaces the predict_filters argument in UrbanSim.
constrained_choices (bool, optional) – “True” means alternatives have limited capacity. “False” (default) means that alternatives can accommodate an unlimited number of choosers.
alt_capacity (str, optional) – Name of a column in the out_alternatives table that expresses the capacity of alternatives. If not provided and constrained_choices is True, each alternative is interpreted as accommodating a single chooser.
chooser_size (str, optional) – Name of a column in the out_choosers table that expresses the size of choosers. Choosers might have varying sizes if the alternative capacities are amounts rather than counts – e.g. square footage. Chooser sizes must be in the same units as alternative capacities. If not provided and constrained_choices is True, each chooser has a size of 1.
max_iter (int or None, optional) – Maximum number of choice simulation iterations. If None (default), the algorithm will iterate until all choosers are matched or no alternatives remain.
name (str, optional) – Name of the model step, passed to ModelManager. If none is provided, a name is generated each time the fit() method runs.
tags (list of str, optional) – Tags, passed to ModelManager.
-
All parameters can also be get and set as properties. The following attributes should
-
be treated as read-only.
-
choices
¶ Available after the model step is run. List of chosen alternative id’s, indexed with the chooser id. Does not persist when the model step is reloaded from storage.
- Type
pd.Series
-
mergedchoicetable
¶ Table built for estimation or simulation. Does not persist when the model step is reloaded from storage. Not available if choices have capacity constraints, because multiple choice tables are generated iteratively.
- Type
choicemodels.tools.MergedChoiceTable
-
model
¶ Available after a model has been fit. Persists when reloaded from storage.
- Type
choicemodels.MultinomialLogitResults
-
probabilities
¶ Available after the model step is run – but not if choices have capacity constraints, which requires probabilities to be calculated multiple times. Provides list of probabilities corresponding to the sampled alternatives, indexed with the chooser and alternative id’s. Does not persist when the model step is reloaded from storage.
- Type
pd.Series
-
fit
(mct=None)[source]¶ Fit the model; save and report results. This uses the ChoiceModels estimation engine (originally from UrbanSim MNL).
The fit() method can be run as many times as desired. Results will not be saved with Orca or ModelManager until the register() method is run.
After sampling alternatives for each chooser, the merged choice table is saved to the class object for diagnostic use (mergedchoicetable with type choicemodels.tools.MergedChoiceTable).
- Parameters
mct (choicemodels.tools.MergedChoiceTable) – This parameter is a temporary backdoor allowing us to pass in a more complicated choice table than can be generated within the template, for example including sampling weights or interaction terms.
- Returns
- Return type
None
-
classmethod
from_dict
(d)[source]¶ Create an object instance from a saved dictionary representation.
- Parameters
d (dict) –
- Returns
- Return type
-
run
(chooser_batch_size=None, interaction_terms=None)[source]¶ Run the model step: simulate choices and use them to update an Orca column.
The simulated choices are saved to the class object for diagnostics. If choices are unconstrained, the choice table and the probabilities of sampled alternatives are saved as well.
- Parameters
chooser_batch_size (int) – This parameter gets passed to choicemodels.tools.simulation.iterative_lottery_choices and is a temporary workaround for dealing with memory issues that arise from generating massive merged choice tables for simulations that involve large numbers of choosers, large numbers of alternatives, and large numbers of predictors. It allows the user to specify a batch size for simulating choices one chunk at a time.
interaction_terms (pandas.Series, pandas.DataFrame, or list of either, optional) – Additional column(s) of interaction terms whose values depend on the combination of observation and alternative, to be merged onto the final data table. If passed as a Series or DataFrame, it should include a two-level MultiIndex. One level’s name and values should match an index or column from the observations table, and the other should match an index or column from the alternatives table.
- Returns
- Return type
None
Segmented Large Multinomial Logit¶
-
class
urbansim_templates.models.
SegmentedLargeMultinomialLogitStep
(defaults=None, segmentation_column=None, name=None, tags=[])[source]¶ This template automatically generates a set of LargeMultinomialLogitStep submodels corresponding to “segments” or categories of choosers. The submodels can be directly accessed and edited.
Running ‘build_submodels()’ will create a submodel for each category of choosers identified in the segmentation column. The submodels are implemented using filter queries.
Once they are generated, the ‘submodels’ property contains a dict of LargeMultinomialLogitStep objects, identified by category name. You can edit their properties as needed, fit them individually, etc.
Editing a property in the ‘defaults’ object will update all the submodels at once, while leaving customizations to other properties intact.
- Parameters
defaults (LargeMultinomialLogitStep, optional) – Object containing initial parameter values for the submodels. Values for ‘choosers’, ‘alternatives’, and ‘choice_column’ are required to generate submodels, but do not have to be provided when the object is created.
segmentation_column (str, optional) – Name of a column of categorical values in the ‘defaults.choosers’ table. Any data that can be interpreted by Pandas as categorical is valid. This is required to generate submodels, but does not have to be provided when the object is created.
name (str, optional) – Name of the model step.
tags (list of str, optional) – Tags associated with the model step.
-
build_submodels
(mct=None)[source]¶ Create a submodel for each category of choosers identified in the segmentation column. Only categories with at least one observation remaining after applying chooser and alternative filters will be included.
Running this method will overwrite any previous submodels.
- Parameters
mct (choicemodels.tools.MergedChoiceTable) – This parameter is a temporary backdoor allowing us to pass in a more complicated choice table than can be generated within the template, for example including sampling weights or interaction terms.
-
fit_all
(mct=None)[source]¶ Fit all the submodels. Build the submodels first, if they don’t exist yet. This method can be run as many times as desired.
- Parameters
mct (choicemodels.tools.MergedChoiceTable) – This parameter is a temporary backdoor allowing us to pass in a more complicated choice table than can be generated within the template, for example including sampling weights or interaction terms.
-
classmethod
from_dict
(d)[source]¶ Create an object instance from a saved dictionary representation.
- Parameters
d (dict) –
- Returns
- Return type
-
get_segmentation_column
(mct=None)[source]¶ Get the column of segmentation values from Orca. Chooser and alternative filters are applied to identify valid observations.
- Parameters
mct (choicemodels.tools.MergedChoiceTable) – This parameter is a temporary backdoor allowing us to pass in a more complicated choice table than can be generated within the template, for example including sampling weights or interaction terms.
- Returns
- Return type
pd.Series
-
run_all
(interaction_terms=None)[source]¶ Run all the submodels.
- Parameters
interaction_terms (pandas.Series, pandas.DataFrame, or list of either, optional) – Additional column(s) of interaction terms whose values depend on the combination of observation and alternative, to be merged onto the final data table. If passed as a Series or DataFrame, it should include a two-level MultiIndex. One level’s name and values should match an index or column from the observations table, and the other should match an index or column from the alternatives table.
-
update_submodels
(param, value)[source]¶ Updates a property across all the submodels. This method is bound to the defaults object and runs automatically when one of its properties is changed.
Note that the chooser_filters and alt_filters properties cannot currently be updated this way, because they can affect the model segmentation. If you are confident the changes are valid, you can edit the submodels directly. Otherwise, you can regenerate them using updated defaults by running build_submodels().
- Parameters
param (str) – Property name.
value (anything) –
Template Step parent class¶
-
class
urbansim_templates.models.
TemplateStep
(tables=None, model_expression=None, filters=None, out_tables=None, out_column=None, out_transform=None, out_filters=None, name=None, tags=[])[source]¶ Shared functionality for the template classes.
- Parameters
tables (str or list of str, optional) – Required to fit a model, but doesn’t have to be provided at initialization.
model_expression (str, optional) – Required to fit a model, but doesn’t have to be provided at initialization.
filters (str or list of str ?, optional) – Replaces fit_filters argument.
out_tables (str or list of str, optional) –
out_column (str, optional) – Replaces out_fname argument.
out_transform (callable, optional) – Replaces ytransform argument.
out_filters (str or list of str ?, optional) – Replaces predict_filters argument.
name (str, optional) – For ModelManager.
tags (list of str, optional) – For ModelManager.