Statistical Models¶

Introduction¶

UrbanSim has two sets of statistical models: regressions and discrete choice models. Each has a three stage usage pattern:

Create a configured model instance. This is where you will supply most of the information to the model such as the actual definition of the model and any filters that restrict the data used during fitting and prediction.
Fit the model by supplying base year data.
Make predictions based on new data.

Model Expressions¶

Statistical models require specification of a “model expression” that describes the model as a mathematical formula. UrbanSim uses patsy to interpret model expressions, but UrbanSim gives you some flexibility as to how you define them.

patsy works with string formula like this simplified regression example (names refer to columns in the DataFrames used during fitting and prediction):

expr = 'np.log1p(sqft_price) ~ I(year_built < 1940) + dist_hwy + ave_income'

In UrbanSim that same formula could be expressed in a dictionary:

expr = {
    'left_side': 'np.log1p(sqft_price)',
    'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}

Formulae used with location choice models have only a right hand side since the models do not predict new numeric values. Right-hand-side formulae can be written as lists or dictionaries:

expr = {
    'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}

expr = ['I(year_built < 1940)', 'dist_hwy', 'ave_income']

Expressing the formula as a string is always an option. The ability to use lists or dictionaries are especially useful to make attractively formatted formulae in YAML config files.

YAML Persistence¶

UrbanSim’s regression and location choice models can be saved as YAML files and loaded again at another time. This feature is especially useful for estimating models in one location, saving the fit parameters to disk, and then using the fitted model for prediction somewhere else.

Use the .to_yaml and .from_yaml methods to save files to disk and load them back as configured models. Here’s an example of loading a regression model, performing fitting, and saving the model back to YAML:

model = RegressionModel.from_yaml('my_model.yaml')

model.fit(data)

model.to_yaml('my_model.yaml')

You can, if you like, write your model configurations entirely in YAML and load them into Python only for fitting and prediction.

API¶

Regression API¶

`RegressionModel`(fit_filters, …[, …])	A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model.
`SegmentedRegressionModel`(segmentation_col[, …])	A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters.
`RegressionModelGroup`(segmentation_col[, name])	Manages a group of regression models that refer to different segments within a single table.

Discrete Choice API¶

`MNLDiscreteChoiceModel`(model_expression, …)	A discrete choice model with the ability to store an estimated model and predict new data based on the model.
`SegmentedMNLDiscreteChoiceModel`(…[, …])	An MNL LCM group that allows segments to have different model expressions but otherwise share configurations.
`MNLDiscreteChoiceModelGroup`(segmentation_col)	Manages a group of discrete choice models that refer to different segments of choosers.

Regression API Docs¶

Use the RegressionModel class to fit a model using statsmodels’ OLS capability and then do subsequent prediction.

class urbansim.models.regression.RegressionModel(fit_filters, predict_filters, model_expression, ytransform=None, name=None)[source]¶

A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model.

statsmodels’ OLS implementation is used.

Parameters

fit_filterslist of str

Filters applied before fitting the model.

predict_filterslist of str

Filters applied before calculating new data points.

model_expressionstr or dict

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

nameoptional

Optional descriptive name for this model that may be used in output.

assert_fitted(self)[source]¶: Raises a RuntimeError if the model is not ready for prediction.

columns_used(self)[source]¶: Returns all the columns used in this model for filtering and in the model expression.

fit(self, data, debug=False)[source]¶

Fit the model to data and store/return the results.

Parameters

datapandas.DataFrame: Data to use for fitting the model. Must contain all the columns referenced by the model_expression.
debugbool: If debug is set to true, this sets the attribute “est_data” to a dataframe with the actual data used for estimation of this model.

Returns

fitstatsmodels.regression.linear_model.OLSResults: This is returned for inspection, but also stored on the class instance for use during prediction.

classmethod fit_from_cfg(df, cfgname, debug=False, outcfgname=None)[source]¶

Parameters

dfDataFrame: The dataframe which contains the columns to use for the estimation.
cfgnamestring: The name of the yaml config file which describes the hedonic model.
debugboolean, optional (default False): Whether to generate debug information on the model.
outcfgnamestring, optional (default cfgname): The name of the output yaml config file where estimation results are written into.

Returns

RegressionModel which was used to fit

property fitted¶: True if the model is ready for prediction.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a RegressionModel instance from a saved YAML configuration. Arguments are mutually exclusive.

Parameters

yaml_strstr, optional: A YAML string from which to load model.
str_or_bufferstr or file like, optional: File name or buffer from which to load YAML.

Returns

RegressionModel

predict(self, data)[source]¶

Predict a new data set based on an estimated model.

Parameters

datapandas.DataFrame: Data to use for prediction. Must contain all the columns referenced by the right-hand side of the model_expression.

Returns

resultpandas.Series: Predicted values as a pandas Series. Will have the index of data after applying filters.

classmethod predict_from_cfg(df, cfgname)[source]¶

Parameters

dfDataFrame: The dataframe which contains the columns to use for the estimation.
cfgnamestring: The name of the yaml config file which describes the hedonic model.

Returns

predictedpandas.Series: Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.
hmRegressionModel which was used to predict

report_fit(self)[source]¶: Print a report of the fit results.

property str_model_expression¶: Model expression as a string suitable for use with patsy/statsmodels.

to_dict(self)[source]¶: Returns a dictionary representation of a RegressionModel instance.

to_yaml(self, str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters

str_or_bufferstr or file like, optional: By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns

jstr: YAML string if str_or_buffer is not given.

class urbansim.models.regression.RegressionModelGroup(segmentation_col, name=None)[source]¶

Manages a group of regression models that refer to different segments within a single table.

Model names must match the segment names after doing a Pandas groupby.

Parameters

segmentation_col: Name of the column on which to segment.
name: Optional name used to identify the model in places.

add_model(self, model)[source]¶

Add a RegressionModel instance.

Parameters

modelRegressionModel: Should have a .name attribute matching one of the groupby segments.

add_model_from_params(self, name, fit_filters, predict_filters, model_expression, ytransform=None)[source]¶

Add a model by passing arguments through to RegressionModel.

Parameters

nameany

Must match a groupby segment name.

fit_filterslist of str

Filters applied before fitting the model.

predict_filterslist of str

Filters applied before calculating new data points.

model_expressionstr

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

columns_used(self)[source]¶: Returns all the columns used across all models in the group for filtering and in the model expression.

fit(self, data, debug=False)[source]¶

Fit each of the models in the group.

Parameters

datapandas.DataFrame: Must have a column with the same name as segmentation_col.
debugbool: If set to true (default false) will pass the debug parameter to model estimation.

Returns

fitsdict of statsmodels.regression.linear_model.OLSResults: Keys are the segment names.

property fitted¶: Whether all models in the group have been fitted.

predict(self, data)[source]¶

Predict new data for each group in the segmentation.

Parameters

datapandas.DataFrame: Data to use for prediction. Must have a column with the same name as segmentation_col.

Returns

predictedpandas.Series: Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.

class urbansim.models.regression.SegmentedRegressionModel(segmentation_col, fit_filters=None, predict_filters=None, default_model_expr=None, default_ytransform=None, min_segment_size=0, name=None)[source]¶

A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters.

Parameters

segmentation_col

Name of column in the data table on which to segment. Will be used with a pandas groupby on the data table.

fit_filterslist of str, optional

Filters applied before fitting the model.

predict_filterslist of str, optional

Filters applied before calculating new data points.

min_segment_sizeint

This model will add all segments that have at least this number of observations. A very small number of observations (e.g. 1) will cause an error with estimation.

default_model_exprstr or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

default_ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

min_segment_sizeint, optional

Segments with less than this many members will be skipped.

namestr, optional

A name used in places to identify the model.

add_segment(self, name, model_expression=None, ytransform='default')[source]¶

Add a new segment with its own model expression and ytransform.

Parameters

name :

Segment name. Must match a segment in the groupby of the data.

model_expressionstr or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

If not given the default ytransform will be used.

columns_used(self)[source]¶: Returns all the columns used across all models in the group for filtering and in the model expression.

fit(self, data, debug=False)[source]¶

Fit each segment. Segments that have not already been explicitly added will be automatically added with default model and ytransform.

Parameters

datapandas.DataFrame: Must have a column with the same name as segmentation_col.
debugbool: If set to true will pass debug to the fit method of each model.

Returns

fitsdict of statsmodels.regression.linear_model.OLSResults: Keys are the segment names.

classmethod fit_from_cfg(df, cfgname, debug=False, min_segment_size=None, outcfgname=None)[source]¶

Parameters

dfDataFrame: The dataframe which contains the columns to use for the estimation.
cfgnamestring: The name of the yaml config file which describes the hedonic model.
debugboolean, optional (default False): Whether to generate debug information on the model.
min_segment_sizeint, optional: Set attribute on the model.
outcfgnamestring, optional (default cfgname): The name of the output yaml config file where estimation results are written into.

Returns

hmSegmentedRegressionModel which was used to fit

property fitted¶: Whether models for all segments have been fit.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a SegmentedRegressionModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters

yaml_strstr, optional: A YAML string from which to load model.
str_or_bufferstr or file like, optional: File name or buffer from which to load YAML.

Returns

SegmentedRegressionModel

predict(self, data)[source]¶

Predict new data for each group in the segmentation.

Parameters

datapandas.DataFrame: Data to use for prediction. Must have a column with the same name as segmentation_col.

Returns

predictedpandas.Series: Predicted data in a pandas Series. Will have the index of data after applying filters.

classmethod predict_from_cfg(df, cfgname, min_segment_size=None)[source]¶

Parameters

dfDataFrame: The dataframe which contains the columns to use for the estimation.
cfgnamestring: The name of the yaml config file which describes the hedonic model.
min_segment_sizeint, optional: Set attribute on the model.

Returns

predictedpandas.Series: Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.
hmSegmentedRegressionModel which was used to predict

to_dict(self)[source]¶: Returns a dict representation of this instance suitable for conversion to YAML.

to_yaml(self, str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters

str_or_bufferstr or file like, optional: By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns

jstr: YAML string if str_or_buffer is not given.

urbansim.models.regression.fit_model(df, filters, model_expression)[source]¶

Use statsmodels OLS to construct a model relation.

Parameters

dfpandas.DataFrame: Data to use for fit. Should contain all the columns referenced in the model_expression.
filterslist of str: Any filters to apply before doing the model fit.
model_expressionstr: A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

Returns

fitstatsmodels.regression.linear_model.OLSResults

urbansim.models.regression.predict(df, filters, model_fit, ytransform=None)[source]¶

Apply model to new data to predict new dependent values.

Parameters

dfpandas.DataFrame

filterslist of str

Any filters to apply before doing prediction.

model_fitstatsmodels.regression.linear_model.OLSResults

Result of model estimation.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

Returns

resultpandas.Series: Predicted values as a pandas Series. Will have the index of df after applying filters.

Discrete Choice API Docs¶

Use the MNLDiscreteChoiceModel class to train a choice module using multinomial logit and make subsequent choice predictions.

class urbansim.models.dcm.DiscreteChoiceModel[source]¶: Abstract base class for discrete choice models.

class urbansim.models.dcm.MNLDiscreteChoiceModel(model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, name=None)[source]¶

A discrete choice model with the ability to store an estimated model and predict new data based on the model. Based on multinomial logit.

Parameters

model_expressionstr, iterable, or dict: A patsy model expression. Should contain only a right-hand side.
sample_sizeint: Number of choices to sample for estimating the model.
probability_modestr, optional: Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.
choice_modestr, optional: Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.
choosers_fit_filterslist of str, optional: Filters applied to choosers table before fitting the model.
choosers_predict_filterslist of str, optional: Filters applied to the choosers table before calculating new data points.
alts_fit_filterslist of str, optional: Filters applied to the alternatives table before fitting the model.
alts_predict_filterslist of str, optional: Filters applied to the alternatives table before calculating new data points.
interaction_predict_filterslist of str, optional: Filters applied to the merged choosers/alternatives table before predicting agent choices.
estimation_sample_sizeint, optional: Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters).
prediction_sample_sizeint, optional: Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.
choice_columnoptional: Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.
nameoptional: Optional descriptive name for this model that may be used in output.

alts_columns_used(self)[source]¶: Columns from the alternatives table that are used for filtering.

apply_fit_filters(self, choosers, alternatives)[source]¶

Filter choosers and alternatives for fitting.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.

Returns

filtered_choosers, filtered_altspandas.DataFrame

apply_predict_filters(self, choosers, alternatives)[source]¶

Filter choosers and alternatives for prediction.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.

Returns

filtered_choosers, filtered_altspandas.DataFrame

assert_fitted(self)[source]¶: Raises RuntimeError if the model is not ready for prediction.

choosers_columns_used(self)[source]¶: Columns from the choosers table that are used for filtering.

columns_used(self)[source]¶: Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(self, choosers, alternatives, current_choice)[source]¶

Fit and save model parameters based on given data.

Parameters

chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

current_choicepandas.Series or any

A Series describing the alternatives currently chosen by the choosers. Should have an index matching choosers and values matching the index of alternatives.

If a non-Series is given it should be a column in choosers.

Returns

log_likelihoodsdict: Dict of log-liklihood values describing the quality of the model fit. Will have keys ‘null’, ‘convergence’, and ‘ratio’.

classmethod fit_from_cfg(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]¶

Parameters

choosersDataFrame: A dataframe in which rows represent choosers.
chosen_fnamestring: A string indicating the column in the choosers dataframe which gives which alternatives the choosers have chosen.
alternativesDataFrame: A table of alternatives. It should include the choices from the choosers table as well as other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.
cfgnamestring: The name of the yaml config file from which to read the discrete choice model.
outcfgnamestring, optional (default cfgname): The name of the output yaml config file where estimation results are written into.

Returns

lcmMNLDiscreteChoiceModel which was used to fit

property fitted¶: True if model is ready for prediction.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a DiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters

yaml_strstr, optional: A YAML string from which to load model.
str_or_bufferstr or file like, optional: File name or buffer from which to load YAML.

Returns

MNLDiscreteChoiceModel

interaction_columns_used(self)[source]¶: Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(self, choosers, alternatives, debug=False)[source]¶

Choose from among alternatives for a group of agents.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.
debugbool: If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns

choicespandas.Series: Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

classmethod predict_from_cfg(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]¶

Simulate choices for the specified choosers

Parameters

choosersDataFrame: A dataframe of agents doing the choosing.
alternativesDataFrame: A dataframe of locations which the choosers are locating in and which have a supply.
cfgnamestring: The name of the yaml config file from which to read the discrete choice model.
cfg: string: an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.
alternative_ratiofloat, optional: Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).
debugboolean, optional (default False): Whether to generate debug information on the model.

Returns

choicespandas.Series: Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.
lcmMNLDiscreteChoiceModel which was used to predict

probabilities(self, choosers, alternatives, filter_tables=True)[source]¶

Returns the probabilities for a set of choosers to choose from among a set of alternatives.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.
filter_tablesbool, optional: If True, filter choosers and alternatives with prediction filters before calculating probabilities.

Returns

probabilitiespandas.Series: Probability of selection associated with each chooser and alternative. Index will be a MultiIndex with alternative IDs in the inner index and chooser IDs in the out index.

report_fit(self)[source]¶: Print a report of the fit results.

property str_model_expression¶: Model expression as a string suitable for use with patsy/statsmodels.

summed_probabilities(self, choosers, alternatives)[source]¶

Calculate total probability associated with each alternative.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.

Returns

probspandas.Series: Total probability associated with each alternative.

to_dict(self)[source]¶: Return a dict respresentation of an MNLDiscreteChoiceModel instance.

to_yaml(self, str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters

str_or_bufferstr or file like, optional: By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns

jstr: YAML is string if str_or_buffer is not given.

class urbansim.models.dcm.MNLDiscreteChoiceModelGroup(segmentation_col, remove_alts=False, name=None)[source]¶

Manages a group of discrete choice models that refer to different segments of choosers.

Model names must match the segment names after doing a pandas groupby.

Parameters

segmentation_colstr: Name of a column in the table of choosers. Will be used to perform a pandas groupby on the choosers table.
remove_altsbool, optional: Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.
namestr, optional: A name that may be used in places to identify this group.

add_model(self, model)[source]¶

Add an MNLDiscreteChoiceModel instance.

Parameters

modelMNLDiscreteChoiceModel: Should have a .name attribute matching one of the segments in the choosers table.

add_model_from_params(self, name, model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None)[source]¶

Add a model by passing parameters through to MNLDiscreteChoiceModel.

Parameters

name: Must match a segment in the choosers table.
model_expressionstr, iterable, or dict: A patsy model expression. Should contain only a right-hand side.
sample_sizeint: Number of choices to sample for estimating the model.
probability_modestr, optional: Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives.
choice_modestr or callable, optional: Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers.
choosers_fit_filterslist of str, optional: Filters applied to choosers table before fitting the model.
choosers_predict_filterslist of str, optional: Filters applied to the choosers table before calculating new data points.
alts_fit_filterslist of str, optional: Filters applied to the alternatives table before fitting the model.
alts_predict_filterslist of str, optional: Filters applied to the alternatives table before calculating new data points.
interaction_predict_filterslist of str, optional: Filters applied to the merged choosers/alternatives table before predicting agent choices.
estimation_sample_sizeint, optional: Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)
prediction_sample_sizeint, optional: Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.
choice_columnoptional: Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.

alts_columns_used(self)[source]¶: Columns from the alternatives table that are used for filtering.

apply_fit_filters(self, choosers, alternatives)[source]¶

Filter choosers and alternatives for fitting. This is done by filtering each submodel and concatenating the results.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.

Returns

filtered_choosers, filtered_altspandas.DataFrame

apply_predict_filters(self, choosers, alternatives)[source]¶

Filter choosers and alternatives for prediction. This is done by filtering each submodel and concatenating the results.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.

Returns

filtered_choosers, filtered_altspandas.DataFrame

choosers_columns_used(self)[source]¶: Columns from the choosers table that are used for filtering.

columns_used(self)[source]¶: Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(self, choosers, alternatives, current_choice)[source]¶

Fit and save models based on given data after segmenting the choosers table.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.
current_choice: Name of column in choosers that indicates which alternative they have currently chosen.

Returns

log_likelihoodsdict of dict: Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.

property fitted¶: Whether all models in the group have been fitted.

interaction_columns_used(self)[source]¶: Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(self, choosers, alternatives, debug=False)[source]¶

Choose from among alternatives for a group of agents after segmenting the choosers table.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.
debugbool: If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns

choicespandas.Series: Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

probabilities(self, choosers, alternatives)[source]¶

Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.

Returns

probabiltiesdict of pandas.Series

summed_probabilities(self, choosers, alternatives)[source]¶

Returns the sum of probabilities for alternatives across all chooser segments.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.

Returns

probspandas.Series: Summed probabilities from each segment added together.

class urbansim.models.dcm.SegmentedMNLDiscreteChoiceModel(segmentation_col, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, default_model_expr=None, remove_alts=False, name=None)[source]¶

An MNL LCM group that allows segments to have different model expressions but otherwise share configurations.

Parameters

segmentation_col: Name of column in the choosers table that will be used for groupby.
sample_sizeint: Number of choices to sample for estimating the model.
probability_modestr, optional: Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.
choice_modestr, optional: Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.
choosers_fit_filterslist of str, optional: Filters applied to choosers table before fitting the model.
choosers_predict_filterslist of str, optional: Filters applied to the choosers table before calculating new data points.
alts_fit_filterslist of str, optional: Filters applied to the alternatives table before fitting the model.
alts_predict_filterslist of str, optional: Filters applied to the alternatives table before calculating new data points.
interaction_predict_filterslist of str, optional: Filters applied to the merged choosers/alternatives table before predicting agent choices.
estimation_sample_sizeint, optional: Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)
prediction_sample_sizeint, optional: Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.
choice_columnoptional: Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.
default_model_exprstr, iterable, or dict, optional: A patsy model expression. Should contain only a right-hand side.
remove_altsbool, optional: Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.
namestr, optional: An optional string used to identify the model in places.

add_segment(self, name, model_expression=None)[source]¶

Add a new segment with its own model expression.

Parameters

name: Segment name. Must match a segment in the groupby of the data.
model_expressionstr or dict, optional: A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.

alts_columns_used(self)[source]¶: Columns from the alternatives table that are used for filtering.

apply_fit_filters(self, choosers, alternatives)[source]¶

Filter choosers and alternatives for fitting.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.

Returns

filtered_choosers, filtered_altspandas.DataFrame

apply_predict_filters(self, choosers, alternatives)[source]¶

Filter choosers and alternatives for prediction.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.

Returns

filtered_choosers, filtered_altspandas.DataFrame

choosers_columns_used(self)[source]¶: Columns from the choosers table that are used for filtering.

columns_used(self)[source]¶: Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(self, choosers, alternatives, current_choice)[source]¶

Fit and save models based on given data after segmenting the choosers table. Segments that have not already been explicitly added will be automatically added with default model.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing, e.g. buildings.
current_choice: Name of column in choosers that indicates which alternative they have currently chosen.

Returns

log_likelihoodsdict of dict: Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.

classmethod fit_from_cfg(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]¶

Parameters

choosersDataFrame: A dataframe of rows of agents that have made choices.
chosen_fnamestring: A string indicating the column in the choosers dataframe which gives which alternative the choosers have chosen.
alternativesDataFrame: A dataframe of alternatives. It should include the current choices from the choosers dataframe as well as some other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.
cfgnamestring: The name of the yaml config file from which to read the discrete choice model.
outcfgnamestring, optional (default cfgname): The name of the output yaml config file where estimation results are written into.

Returns

lcmSegmentedMNLDiscreteChoiceModel which was used to fit

property fitted¶: Whether models for all segments have been fit.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a SegmentedMNLDiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters

yaml_strstr, optional: A YAML string from which to load model.
str_or_bufferstr or file like, optional: File name or buffer from which to load YAML.

Returns

SegmentedMNLDiscreteChoiceModel

interaction_columns_used(self)[source]¶: Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(self, choosers, alternatives, debug=False)[source]¶

Choose from among alternatives for a group of agents after segmenting the choosers table.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.
debugbool: If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns

choicespandas.Series: Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

classmethod predict_from_cfg(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]¶

Simulate the discrete choices for the specified choosers

Parameters

choosersDataFrame: A dataframe of agents doing the choosing.
alternativesDataFrame: A dataframe of alternatives which the choosers are locating in and which have a supply.
cfgnamestring: The name of the yaml config file from which to read the discrete choice model.
cfg: string: an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.
alternative_ratiofloat: Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).

Returns

choicespandas.Series: Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.
lcmSegmentedMNLDiscreteChoiceModel which was used to predict

probabilities(self, choosers, alternatives)[source]¶

Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.

Returns

probabiltiesdict of pandas.Series

summed_probabilities(self, choosers, alternatives)[source]¶

Returns the sum of probabilities for alternatives across all chooser segments.

Parameters

chooserspandas.DataFrame: Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
alternativespandas.DataFrame: Table describing the things from which agents are choosing.

Returns

probspandas.Series: Summed probabilities from each segment added together.

to_dict(self)[source]¶: Returns a dict representation of this instance suitable for conversion to YAML.

to_yaml(self, str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters

str_or_bufferstr or file like, optional: By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns

jstr: YAML is string if str_or_buffer is not given.

urbansim.models.dcm.unit_choice(chooser_ids, alternative_ids, probabilities)[source]¶

Have a set of choosers choose from among alternatives according to a probability distribution. Choice is binary: each alternative can only be chosen once.

Parameters

chooser_ids1d array_like: Array of IDs of the agents that are making choices.
alternative_ids1d array_like: Array of IDs of alternatives among which agents are making choices.
probabilities1d array_like: The probability that an agent will choose an alternative. Must be the same shape as alternative_ids. Unavailable alternatives should have a probability of 0.

Returns

choicespandas.Series: Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.