Statistical Models

Introduction

UrbanSim has two sets of statistical models: regressions and discrete choice models. Each has a three stage usage pattern:

  1. Create a configured model instance. This is where you will supply most of the information to the model such as the actual definition of the model and any filters that restrict the data used during fitting and prediction.

  2. Fit the model by supplying base year data.

  3. Make predictions based on new data.

Model Expressions

Statistical models require specification of a “model expression” that describes the model as a mathematical formula. UrbanSim uses patsy to interpret model expressions, but UrbanSim gives you some flexibility as to how you define them.

patsy works with string formula like this simplified regression example (names refer to columns in the DataFrames used during fitting and prediction):

expr = 'np.log1p(sqft_price) ~ I(year_built < 1940) + dist_hwy + ave_income'

In UrbanSim that same formula could be expressed in a dictionary:

expr = {
    'left_side': 'np.log1p(sqft_price)',
    'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}

Formulae used with location choice models have only a right hand side since the models do not predict new numeric values. Right-hand-side formulae can be written as lists or dictionaries:

expr = {
    'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}

expr = ['I(year_built < 1940)', 'dist_hwy', 'ave_income']

Expressing the formula as a string is always an option. The ability to use lists or dictionaries are especially useful to make attractively formatted formulae in YAML config files.

YAML Persistence

UrbanSim’s regression and location choice models can be saved as YAML files and loaded again at another time. This feature is especially useful for estimating models in one location, saving the fit parameters to disk, and then using the fitted model for prediction somewhere else.

Use the .to_yaml and .from_yaml methods to save files to disk and load them back as configured models. Here’s an example of loading a regression model, performing fitting, and saving the model back to YAML:

model = RegressionModel.from_yaml('my_model.yaml')

model.fit(data)

model.to_yaml('my_model.yaml')

You can, if you like, write your model configurations entirely in YAML and load them into Python only for fitting and prediction.

API

Regression API

RegressionModel(fit_filters, …[, …])

A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model.

SegmentedRegressionModel(segmentation_col[, …])

A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters.

RegressionModelGroup(segmentation_col[, name])

Manages a group of regression models that refer to different segments within a single table.

Discrete Choice API

MNLDiscreteChoiceModel(model_expression, …)

A discrete choice model with the ability to store an estimated model and predict new data based on the model.

SegmentedMNLDiscreteChoiceModel(…[, …])

An MNL LCM group that allows segments to have different model expressions but otherwise share configurations.

MNLDiscreteChoiceModelGroup(segmentation_col)

Manages a group of discrete choice models that refer to different segments of choosers.

Regression API Docs

Use the RegressionModel class to fit a model using statsmodels’ OLS capability and then do subsequent prediction.

class urbansim.models.regression.RegressionModel(fit_filters, predict_filters, model_expression, ytransform=None, name=None)[source]

A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model.

statsmodels’ OLS implementation is used.

Parameters
fit_filterslist of str

Filters applied before fitting the model.

predict_filterslist of str

Filters applied before calculating new data points.

model_expressionstr or dict

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

nameoptional

Optional descriptive name for this model that may be used in output.

assert_fitted(self)[source]

Raises a RuntimeError if the model is not ready for prediction.

columns_used(self)[source]

Returns all the columns used in this model for filtering and in the model expression.

fit(self, data, debug=False)[source]

Fit the model to data and store/return the results.

Parameters
datapandas.DataFrame

Data to use for fitting the model. Must contain all the columns referenced by the model_expression.

debugbool

If debug is set to true, this sets the attribute “est_data” to a dataframe with the actual data used for estimation of this model.

Returns
fitstatsmodels.regression.linear_model.OLSResults

This is returned for inspection, but also stored on the class instance for use during prediction.

classmethod fit_from_cfg(df, cfgname, debug=False, outcfgname=None)[source]
Parameters
dfDataFrame

The dataframe which contains the columns to use for the estimation.

cfgnamestring

The name of the yaml config file which describes the hedonic model.

debugboolean, optional (default False)

Whether to generate debug information on the model.

outcfgnamestring, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns
RegressionModel which was used to fit
property fitted

True if the model is ready for prediction.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]

Create a RegressionModel instance from a saved YAML configuration. Arguments are mutually exclusive.

Parameters
yaml_strstr, optional

A YAML string from which to load model.

str_or_bufferstr or file like, optional

File name or buffer from which to load YAML.

Returns
RegressionModel
predict(self, data)[source]

Predict a new data set based on an estimated model.

Parameters
datapandas.DataFrame

Data to use for prediction. Must contain all the columns referenced by the right-hand side of the model_expression.

Returns
resultpandas.Series

Predicted values as a pandas Series. Will have the index of data after applying filters.

classmethod predict_from_cfg(df, cfgname)[source]
Parameters
dfDataFrame

The dataframe which contains the columns to use for the estimation.

cfgnamestring

The name of the yaml config file which describes the hedonic model.

Returns
predictedpandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.

hmRegressionModel which was used to predict
report_fit(self)[source]

Print a report of the fit results.

property str_model_expression

Model expression as a string suitable for use with patsy/statsmodels.

to_dict(self)[source]

Returns a dictionary representation of a RegressionModel instance.

to_yaml(self, str_or_buffer=None)[source]

Save a model respresentation to YAML.

Parameters
str_or_bufferstr or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns
jstr

YAML string if str_or_buffer is not given.

class urbansim.models.regression.RegressionModelGroup(segmentation_col, name=None)[source]

Manages a group of regression models that refer to different segments within a single table.

Model names must match the segment names after doing a Pandas groupby.

Parameters
segmentation_col

Name of the column on which to segment.

name

Optional name used to identify the model in places.

add_model(self, model)[source]

Add a RegressionModel instance.

Parameters
modelRegressionModel

Should have a .name attribute matching one of the groupby segments.

add_model_from_params(self, name, fit_filters, predict_filters, model_expression, ytransform=None)[source]

Add a model by passing arguments through to RegressionModel.

Parameters
nameany

Must match a groupby segment name.

fit_filterslist of str

Filters applied before fitting the model.

predict_filterslist of str

Filters applied before calculating new data points.

model_expressionstr

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

columns_used(self)[source]

Returns all the columns used across all models in the group for filtering and in the model expression.

fit(self, data, debug=False)[source]

Fit each of the models in the group.

Parameters
datapandas.DataFrame

Must have a column with the same name as segmentation_col.

debugbool

If set to true (default false) will pass the debug parameter to model estimation.

Returns
fitsdict of statsmodels.regression.linear_model.OLSResults

Keys are the segment names.

property fitted

Whether all models in the group have been fitted.

predict(self, data)[source]

Predict new data for each group in the segmentation.

Parameters
datapandas.DataFrame

Data to use for prediction. Must have a column with the same name as segmentation_col.

Returns
predictedpandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.

class urbansim.models.regression.SegmentedRegressionModel(segmentation_col, fit_filters=None, predict_filters=None, default_model_expr=None, default_ytransform=None, min_segment_size=0, name=None)[source]

A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters.

Parameters
segmentation_col

Name of column in the data table on which to segment. Will be used with a pandas groupby on the data table.

fit_filterslist of str, optional

Filters applied before fitting the model.

predict_filterslist of str, optional

Filters applied before calculating new data points.

min_segment_sizeint

This model will add all segments that have at least this number of observations. A very small number of observations (e.g. 1) will cause an error with estimation.

default_model_exprstr or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

default_ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

min_segment_sizeint, optional

Segments with less than this many members will be skipped.

namestr, optional

A name used in places to identify the model.

add_segment(self, name, model_expression=None, ytransform='default')[source]

Add a new segment with its own model expression and ytransform.

Parameters
name :

Segment name. Must match a segment in the groupby of the data.

model_expressionstr or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

If not given the default ytransform will be used.

columns_used(self)[source]

Returns all the columns used across all models in the group for filtering and in the model expression.

fit(self, data, debug=False)[source]

Fit each segment. Segments that have not already been explicitly added will be automatically added with default model and ytransform.

Parameters
datapandas.DataFrame

Must have a column with the same name as segmentation_col.

debugbool

If set to true will pass debug to the fit method of each model.

Returns
fitsdict of statsmodels.regression.linear_model.OLSResults

Keys are the segment names.

classmethod fit_from_cfg(df, cfgname, debug=False, min_segment_size=None, outcfgname=None)[source]
Parameters
dfDataFrame

The dataframe which contains the columns to use for the estimation.

cfgnamestring

The name of the yaml config file which describes the hedonic model.

debugboolean, optional (default False)

Whether to generate debug information on the model.

min_segment_sizeint, optional

Set attribute on the model.

outcfgnamestring, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns
hmSegmentedRegressionModel which was used to fit
property fitted

Whether models for all segments have been fit.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]

Create a SegmentedRegressionModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters
yaml_strstr, optional

A YAML string from which to load model.

str_or_bufferstr or file like, optional

File name or buffer from which to load YAML.

Returns
SegmentedRegressionModel
predict(self, data)[source]

Predict new data for each group in the segmentation.

Parameters
datapandas.DataFrame

Data to use for prediction. Must have a column with the same name as segmentation_col.

Returns
predictedpandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters.

classmethod predict_from_cfg(df, cfgname, min_segment_size=None)[source]
Parameters
dfDataFrame

The dataframe which contains the columns to use for the estimation.

cfgnamestring

The name of the yaml config file which describes the hedonic model.

min_segment_sizeint, optional

Set attribute on the model.

Returns
predictedpandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.

hmSegmentedRegressionModel which was used to predict
to_dict(self)[source]

Returns a dict representation of this instance suitable for conversion to YAML.

to_yaml(self, str_or_buffer=None)[source]

Save a model respresentation to YAML.

Parameters
str_or_bufferstr or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns
jstr

YAML string if str_or_buffer is not given.

urbansim.models.regression.fit_model(df, filters, model_expression)[source]

Use statsmodels OLS to construct a model relation.

Parameters
dfpandas.DataFrame

Data to use for fit. Should contain all the columns referenced in the model_expression.

filterslist of str

Any filters to apply before doing the model fit.

model_expressionstr

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

Returns
fitstatsmodels.regression.linear_model.OLSResults
urbansim.models.regression.predict(df, filters, model_fit, ytransform=None)[source]

Apply model to new data to predict new dependent values.

Parameters
dfpandas.DataFrame
filterslist of str

Any filters to apply before doing prediction.

model_fitstatsmodels.regression.linear_model.OLSResults

Result of model estimation.

ytransformcallable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

Returns
resultpandas.Series

Predicted values as a pandas Series. Will have the index of df after applying filters.

Discrete Choice API Docs

Use the MNLDiscreteChoiceModel class to train a choice module using multinomial logit and make subsequent choice predictions.

class urbansim.models.dcm.DiscreteChoiceModel[source]

Abstract base class for discrete choice models.

class urbansim.models.dcm.MNLDiscreteChoiceModel(model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, name=None)[source]

A discrete choice model with the ability to store an estimated model and predict new data based on the model. Based on multinomial logit.

Parameters
model_expressionstr, iterable, or dict

A patsy model expression. Should contain only a right-hand side.

sample_sizeint

Number of choices to sample for estimating the model.

probability_modestr, optional

Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.

choice_modestr, optional

Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.

choosers_fit_filterslist of str, optional

Filters applied to choosers table before fitting the model.

choosers_predict_filterslist of str, optional

Filters applied to the choosers table before calculating new data points.

alts_fit_filterslist of str, optional

Filters applied to the alternatives table before fitting the model.

alts_predict_filterslist of str, optional

Filters applied to the alternatives table before calculating new data points.

interaction_predict_filterslist of str, optional

Filters applied to the merged choosers/alternatives table before predicting agent choices.

estimation_sample_sizeint, optional

Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters).

prediction_sample_sizeint, optional

Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.

choice_columnoptional

Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.

nameoptional

Optional descriptive name for this model that may be used in output.

alts_columns_used(self)[source]

Columns from the alternatives table that are used for filtering.

apply_fit_filters(self, choosers, alternatives)[source]

Filter choosers and alternatives for fitting.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns
filtered_choosers, filtered_altspandas.DataFrame
apply_predict_filters(self, choosers, alternatives)[source]

Filter choosers and alternatives for prediction.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns
filtered_choosers, filtered_altspandas.DataFrame
assert_fitted(self)[source]

Raises RuntimeError if the model is not ready for prediction.

choosers_columns_used(self)[source]

Columns from the choosers table that are used for filtering.

columns_used(self)[source]

Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(self, choosers, alternatives, current_choice)[source]

Fit and save model parameters based on given data.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

current_choicepandas.Series or any

A Series describing the alternatives currently chosen by the choosers. Should have an index matching choosers and values matching the index of alternatives.

If a non-Series is given it should be a column in choosers.

Returns
log_likelihoodsdict

Dict of log-liklihood values describing the quality of the model fit. Will have keys ‘null’, ‘convergence’, and ‘ratio’.

classmethod fit_from_cfg(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]
Parameters
choosersDataFrame

A dataframe in which rows represent choosers.

chosen_fnamestring

A string indicating the column in the choosers dataframe which gives which alternatives the choosers have chosen.

alternativesDataFrame

A table of alternatives. It should include the choices from the choosers table as well as other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.

cfgnamestring

The name of the yaml config file from which to read the discrete choice model.

outcfgnamestring, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns
lcmMNLDiscreteChoiceModel which was used to fit
property fitted

True if model is ready for prediction.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]

Create a DiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters
yaml_strstr, optional

A YAML string from which to load model.

str_or_bufferstr or file like, optional

File name or buffer from which to load YAML.

Returns
MNLDiscreteChoiceModel
interaction_columns_used(self)[source]

Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(self, choosers, alternatives, debug=False)[source]

Choose from among alternatives for a group of agents.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

debugbool

If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns
choicespandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

classmethod predict_from_cfg(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]

Simulate choices for the specified choosers

Parameters
choosersDataFrame

A dataframe of agents doing the choosing.

alternativesDataFrame

A dataframe of locations which the choosers are locating in and which have a supply.

cfgnamestring

The name of the yaml config file from which to read the discrete choice model.

cfg: string

an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.

alternative_ratiofloat, optional

Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).

debugboolean, optional (default False)

Whether to generate debug information on the model.

Returns
choicespandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

lcmMNLDiscreteChoiceModel which was used to predict
probabilities(self, choosers, alternatives, filter_tables=True)[source]

Returns the probabilities for a set of choosers to choose from among a set of alternatives.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

filter_tablesbool, optional

If True, filter choosers and alternatives with prediction filters before calculating probabilities.

Returns
probabilitiespandas.Series

Probability of selection associated with each chooser and alternative. Index will be a MultiIndex with alternative IDs in the inner index and chooser IDs in the out index.

report_fit(self)[source]

Print a report of the fit results.

property str_model_expression

Model expression as a string suitable for use with patsy/statsmodels.

summed_probabilities(self, choosers, alternatives)[source]

Calculate total probability associated with each alternative.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

Returns
probspandas.Series

Total probability associated with each alternative.

to_dict(self)[source]

Return a dict respresentation of an MNLDiscreteChoiceModel instance.

to_yaml(self, str_or_buffer=None)[source]

Save a model respresentation to YAML.

Parameters
str_or_bufferstr or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns
jstr

YAML is string if str_or_buffer is not given.

class urbansim.models.dcm.MNLDiscreteChoiceModelGroup(segmentation_col, remove_alts=False, name=None)[source]

Manages a group of discrete choice models that refer to different segments of choosers.

Model names must match the segment names after doing a pandas groupby.

Parameters
segmentation_colstr

Name of a column in the table of choosers. Will be used to perform a pandas groupby on the choosers table.

remove_altsbool, optional

Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.

namestr, optional

A name that may be used in places to identify this group.

add_model(self, model)[source]

Add an MNLDiscreteChoiceModel instance.

Parameters
modelMNLDiscreteChoiceModel

Should have a .name attribute matching one of the segments in the choosers table.

add_model_from_params(self, name, model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None)[source]

Add a model by passing parameters through to MNLDiscreteChoiceModel.

Parameters
name

Must match a segment in the choosers table.

model_expressionstr, iterable, or dict

A patsy model expression. Should contain only a right-hand side.

sample_sizeint

Number of choices to sample for estimating the model.

probability_modestr, optional

Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives.

choice_modestr or callable, optional

Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers.

choosers_fit_filterslist of str, optional

Filters applied to choosers table before fitting the model.

choosers_predict_filterslist of str, optional

Filters applied to the choosers table before calculating new data points.

alts_fit_filterslist of str, optional

Filters applied to the alternatives table before fitting the model.

alts_predict_filterslist of str, optional

Filters applied to the alternatives table before calculating new data points.

interaction_predict_filterslist of str, optional

Filters applied to the merged choosers/alternatives table before predicting agent choices.

estimation_sample_sizeint, optional

Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)

prediction_sample_sizeint, optional

Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.

choice_columnoptional

Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.

alts_columns_used(self)[source]

Columns from the alternatives table that are used for filtering.

apply_fit_filters(self, choosers, alternatives)[source]

Filter choosers and alternatives for fitting. This is done by filtering each submodel and concatenating the results.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns
filtered_choosers, filtered_altspandas.DataFrame
apply_predict_filters(self, choosers, alternatives)[source]

Filter choosers and alternatives for prediction. This is done by filtering each submodel and concatenating the results.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns
filtered_choosers, filtered_altspandas.DataFrame
choosers_columns_used(self)[source]

Columns from the choosers table that are used for filtering.

columns_used(self)[source]

Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(self, choosers, alternatives, current_choice)[source]

Fit and save models based on given data after segmenting the choosers table.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

current_choice

Name of column in choosers that indicates which alternative they have currently chosen.

Returns
log_likelihoodsdict of dict

Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.

property fitted

Whether all models in the group have been fitted.

interaction_columns_used(self)[source]

Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(self, choosers, alternatives, debug=False)[source]

Choose from among alternatives for a group of agents after segmenting the choosers table.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

debugbool

If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns
choicespandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

probabilities(self, choosers, alternatives)[source]

Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

Returns
probabiltiesdict of pandas.Series
summed_probabilities(self, choosers, alternatives)[source]

Returns the sum of probabilities for alternatives across all chooser segments.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

Returns
probspandas.Series

Summed probabilities from each segment added together.

class urbansim.models.dcm.SegmentedMNLDiscreteChoiceModel(segmentation_col, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, default_model_expr=None, remove_alts=False, name=None)[source]

An MNL LCM group that allows segments to have different model expressions but otherwise share configurations.

Parameters
segmentation_col

Name of column in the choosers table that will be used for groupby.

sample_sizeint

Number of choices to sample for estimating the model.

probability_modestr, optional

Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.

choice_modestr, optional

Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.

choosers_fit_filterslist of str, optional

Filters applied to choosers table before fitting the model.

choosers_predict_filterslist of str, optional

Filters applied to the choosers table before calculating new data points.

alts_fit_filterslist of str, optional

Filters applied to the alternatives table before fitting the model.

alts_predict_filterslist of str, optional

Filters applied to the alternatives table before calculating new data points.

interaction_predict_filterslist of str, optional

Filters applied to the merged choosers/alternatives table before predicting agent choices.

estimation_sample_sizeint, optional

Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)

prediction_sample_sizeint, optional

Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.

choice_columnoptional

Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.

default_model_exprstr, iterable, or dict, optional

A patsy model expression. Should contain only a right-hand side.

remove_altsbool, optional

Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.

namestr, optional

An optional string used to identify the model in places.

add_segment(self, name, model_expression=None)[source]

Add a new segment with its own model expression.

Parameters
name

Segment name. Must match a segment in the groupby of the data.

model_expressionstr or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.

alts_columns_used(self)[source]

Columns from the alternatives table that are used for filtering.

apply_fit_filters(self, choosers, alternatives)[source]

Filter choosers and alternatives for fitting.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns
filtered_choosers, filtered_altspandas.DataFrame
apply_predict_filters(self, choosers, alternatives)[source]

Filter choosers and alternatives for prediction.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns
filtered_choosers, filtered_altspandas.DataFrame
choosers_columns_used(self)[source]

Columns from the choosers table that are used for filtering.

columns_used(self)[source]

Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(self, choosers, alternatives, current_choice)[source]

Fit and save models based on given data after segmenting the choosers table. Segments that have not already been explicitly added will be automatically added with default model.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

current_choice

Name of column in choosers that indicates which alternative they have currently chosen.

Returns
log_likelihoodsdict of dict

Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.

classmethod fit_from_cfg(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]
Parameters
choosersDataFrame

A dataframe of rows of agents that have made choices.

chosen_fnamestring

A string indicating the column in the choosers dataframe which gives which alternative the choosers have chosen.

alternativesDataFrame

A dataframe of alternatives. It should include the current choices from the choosers dataframe as well as some other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.

cfgnamestring

The name of the yaml config file from which to read the discrete choice model.

outcfgnamestring, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns
lcmSegmentedMNLDiscreteChoiceModel which was used to fit
property fitted

Whether models for all segments have been fit.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]

Create a SegmentedMNLDiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters
yaml_strstr, optional

A YAML string from which to load model.

str_or_bufferstr or file like, optional

File name or buffer from which to load YAML.

Returns
SegmentedMNLDiscreteChoiceModel
interaction_columns_used(self)[source]

Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(self, choosers, alternatives, debug=False)[source]

Choose from among alternatives for a group of agents after segmenting the choosers table.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

debugbool

If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns
choicespandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

classmethod predict_from_cfg(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]

Simulate the discrete choices for the specified choosers

Parameters
choosersDataFrame

A dataframe of agents doing the choosing.

alternativesDataFrame

A dataframe of alternatives which the choosers are locating in and which have a supply.

cfgnamestring

The name of the yaml config file from which to read the discrete choice model.

cfg: string

an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.

alternative_ratiofloat

Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).

Returns
choicespandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

lcmSegmentedMNLDiscreteChoiceModel which was used to predict
probabilities(self, choosers, alternatives)[source]

Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

Returns
probabiltiesdict of pandas.Series
summed_probabilities(self, choosers, alternatives)[source]

Returns the sum of probabilities for alternatives across all chooser segments.

Parameters
chooserspandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternativespandas.DataFrame

Table describing the things from which agents are choosing.

Returns
probspandas.Series

Summed probabilities from each segment added together.

to_dict(self)[source]

Returns a dict representation of this instance suitable for conversion to YAML.

to_yaml(self, str_or_buffer=None)[source]

Save a model respresentation to YAML.

Parameters
str_or_bufferstr or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns
jstr

YAML is string if str_or_buffer is not given.

urbansim.models.dcm.unit_choice(chooser_ids, alternative_ids, probabilities)[source]

Have a set of choosers choose from among alternatives according to a probability distribution. Choice is binary: each alternative can only be chosen once.

Parameters
chooser_ids1d array_like

Array of IDs of the agents that are making choices.

alternative_ids1d array_like

Array of IDs of alternatives among which agents are making choices.

probabilities1d array_like

The probability that an agent will choose an alternative. Must be the same shape as alternative_ids. Unavailable alternatives should have a probability of 0.

Returns
choicespandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.