Statistical Models¶

Introduction¶

UrbanSim has two sets of statistical models: regressions and discrete choice models. Each has a three stage usage pattern:

Create a configured model instance. This is where you will supply most of the information to the model such as the actual definition of the model and any filters that restrict the data used during fitting and prediction.
Fit the model by supplying base year data.
Make predictions based on new data.

Model Expressions¶

Statistical models require specification of a “model expression” that describes the model as a mathematical formula. UrbanSim uses patsy to interpret model expressions, but UrbanSim gives you some flexibility as to how you define them.

patsy works with string formula like this simplified regression example (names refer to columns in the DataFrames used during fitting and prediction):

expr = 'np.log1p(sqft_price) ~ I(year_built < 1940) + dist_hwy + ave_income'

In UrbanSim that same formula could be expressed in a dictionary:

expr = {
    'left_side': 'np.log1p(sqft_price)',
    'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}

Formulae used with location choice models have only a right hand side since the models do not predict new numeric values. Right-hand-side formulae can be written as lists or dictionaries:

expr = {
    'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}

expr = ['I(year_built < 1940)', 'dist_hwy', 'ave_income']

Expressing the formula as a string is always an option. The ability to use lists or dictionaries are especially useful to make attractively formatted formulae in YAML config files.

YAML Persistence¶

UrbanSim’s regression and location choice models can be saved as YAML files and loaded again at another time. This feature is especially useful for estimating models in one location, saving the fit parameters to disk, and then using the fitted model for prediction somewhere else.

Use the .to_yaml and .from_yaml methods to save files to disk and load them back as configured models. Here’s an example of loading a regression model, performing fitting, and saving the model back to YAML:

model = RegressionModel.from_yaml('my_model.yaml')

model.fit(data)

model.to_yaml('my_model.yaml')

You can, if you like, write your model configurations entirely in YAML and load them into Python only for fitting and prediction.

API¶

Regression API¶

`RegressionModel`(fit_filters, ...[, ...])	A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model.
`SegmentedRegressionModel`(segmentation_col[, ...])	A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters.
`RegressionModelGroup`(segmentation_col[, name])	Manages a group of regression models that refer to different segments within a single table.

Discrete Choice API¶

`MNLDiscreteChoiceModel`(model_expression, ...)	A discrete choice model with the ability to store an estimated model and predict new data based on the model.
`SegmentedMNLDiscreteChoiceModel`(...[, ...])	An MNL LCM group that allows segments to have different model expressions but otherwise share configurations.
`MNLDiscreteChoiceModelGroup`(segmentation_col)	Manages a group of discrete choice models that refer to different segments of choosers.

Regression API Docs¶

Use the RegressionModel class to fit a model using statsmodels’ OLS capability and then do subsequent prediction.

class urbansim.models.regression.RegressionModel(fit_filters, predict_filters, model_expression, ytransform=None, name=None)[source]¶

A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model.

statsmodels’ OLS implementation is used.

Parameters:

fit_filters : list of str

Filters applied before fitting the model.

predict_filters : list of str

Filters applied before calculating new data points.

model_expression : str or dict

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

ytransform : callable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

name : optional

Optional descriptive name for this model that may be used in output.

assert_fitted()[source]¶: Raises a RuntimeError if the model is not ready for prediction.

columns_used()[source]¶: Returns all the columns used in this model for filtering and in the model expression.

fit(data, debug=False)[source]¶

Fit the model to data and store/return the results.

Parameters:

data : pandas.DataFrame

Data to use for fitting the model. Must contain all the columns referenced by the model_expression.

debug : bool

If debug is set to true, this sets the attribute “est_data” to a dataframe with the actual data used for estimation of this model.

Returns:

fit : statsmodels.regression.linear_model.OLSResults

This is returned for inspection, but also stored on the class instance for use during prediction.

classmethod fit_from_cfg(df, cfgname, debug=False, outcfgname=None)[source]¶

Parameters:

df : DataFrame

The dataframe which contains the columns to use for the estimation.

cfgname : string

The name of the yaml config file which describes the hedonic model.

debug : boolean, optional (default False)

Whether to generate debug information on the model.

outcfgname : string, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns:

RegressionModel which was used to fit

fitted¶: True if the model is ready for prediction.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a RegressionModel instance from a saved YAML configuration. Arguments are mutually exclusive.

Parameters:

yaml_str : str, optional

A YAML string from which to load model.

str_or_buffer : str or file like, optional

File name or buffer from which to load YAML.

Returns:

RegressionModel

predict(data)[source]¶

Predict a new data set based on an estimated model.

Parameters:

data : pandas.DataFrame

Data to use for prediction. Must contain all the columns referenced by the right-hand side of the model_expression.

Returns:

result : pandas.Series

Predicted values as a pandas Series. Will have the index of data after applying filters.

classmethod predict_from_cfg(df, cfgname)[source]¶

Parameters:

df : DataFrame

The dataframe which contains the columns to use for the estimation.

cfgname : string

The name of the yaml config file which describes the hedonic model.

Returns:

predicted : pandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.

hm : RegressionModel which was used to predict

report_fit()[source]¶: Print a report of the fit results.

str_model_expression¶: Model expression as a string suitable for use with patsy/statsmodels.

to_dict()[source]¶: Returns a dictionary representation of a RegressionModel instance.

to_yaml(str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters:

str_or_buffer : str or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns:

j : str

YAML string if str_or_buffer is not given.

class urbansim.models.regression.RegressionModelGroup(segmentation_col, name=None)[source]¶

Manages a group of regression models that refer to different segments within a single table.

Model names must match the segment names after doing a Pandas groupby.

Parameters:

segmentation_col

Name of the column on which to segment.

name

Optional name used to identify the model in places.

add_model(model)[source]¶

Add a RegressionModel instance.

Parameters:

model : RegressionModel

Should have a .name attribute matching one of the groupby segments.

add_model_from_params(name, fit_filters, predict_filters, model_expression, ytransform=None)[source]¶

Add a model by passing arguments through to RegressionModel.

Parameters:

name : any

Must match a groupby segment name.

fit_filters : list of str

Filters applied before fitting the model.

predict_filters : list of str

Filters applied before calculating new data points.

model_expression : str

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

ytransform : callable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

columns_used()[source]¶: Returns all the columns used across all models in the group for filtering and in the model expression.

fit(data, debug=False)[source]¶

Fit each of the models in the group.

Parameters:

data : pandas.DataFrame

Must have a column with the same name as segmentation_col.

debug : bool

If set to true (default false) will pass the debug parameter to model estimation.

Returns:

fits : dict of statsmodels.regression.linear_model.OLSResults

Keys are the segment names.

fitted¶: Whether all models in the group have been fitted.

predict(data)[source]¶

Predict new data for each group in the segmentation.

Parameters:

data : pandas.DataFrame

Data to use for prediction. Must have a column with the same name as segmentation_col.

Returns:

predicted : pandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.

class urbansim.models.regression.SegmentedRegressionModel(segmentation_col, fit_filters=None, predict_filters=None, default_model_expr=None, default_ytransform=None, min_segment_size=0, name=None)[source]¶

A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters.

Parameters:

segmentation_col

Name of column in the data table on which to segment. Will be used with a pandas groupby on the data table.

fit_filters : list of str, optional

Filters applied before fitting the model.

predict_filters : list of str, optional

Filters applied before calculating new data points.

min_segment_size : int

This model will add all segments that have at least this number of observations. A very small number of observations (e.g. 1) will cause an error with estimation.

default_model_expr : str or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

default_ytransform : callable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

min_segment_size : int, optional

Segments with less than this many members will be skipped.

name : str, optional

A name used in places to identify the model.

add_segment(name, model_expression=None, ytransform='default')[source]¶

Add a new segment with its own model expression and ytransform.

Parameters:

name :

Segment name. Must match a segment in the groupby of the data.

model_expression : str or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.

ytransform : callable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

If not given the default ytransform will be used.

columns_used()[source]¶: Returns all the columns used across all models in the group for filtering and in the model expression.

fit(data, debug=False)[source]¶

Fit each segment. Segments that have not already been explicitly added will be automatically added with default model and ytransform.

Parameters:

data : pandas.DataFrame

Must have a column with the same name as segmentation_col.

debug : bool

If set to true will pass debug to the fit method of each model.

Returns:

fits : dict of statsmodels.regression.linear_model.OLSResults

Keys are the segment names.

classmethod fit_from_cfg(df, cfgname, debug=False, min_segment_size=None, outcfgname=None)[source]¶

Parameters:

df : DataFrame

The dataframe which contains the columns to use for the estimation.

cfgname : string

The name of the yaml config file which describes the hedonic model.

debug : boolean, optional (default False)

Whether to generate debug information on the model.

min_segment_size : int, optional

Set attribute on the model.

outcfgname : string, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns:

hm : SegmentedRegressionModel which was used to fit

fitted¶: Whether models for all segments have been fit.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a SegmentedRegressionModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters:

yaml_str : str, optional

A YAML string from which to load model.

str_or_buffer : str or file like, optional

File name or buffer from which to load YAML.

Returns:

SegmentedRegressionModel

predict(data)[source]¶

Predict new data for each group in the segmentation.

Parameters:

data : pandas.DataFrame

Data to use for prediction. Must have a column with the same name as segmentation_col.

Returns:

predicted : pandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters.

classmethod predict_from_cfg(df, cfgname, min_segment_size=None)[source]¶

Parameters:

df : DataFrame

The dataframe which contains the columns to use for the estimation.

cfgname : string

The name of the yaml config file which describes the hedonic model.

min_segment_size : int, optional

Set attribute on the model.

Returns:

predicted : pandas.Series

Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.

hm : SegmentedRegressionModel which was used to predict

to_dict()[source]¶: Returns a dict representation of this instance suitable for conversion to YAML.

to_yaml(str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters:

str_or_buffer : str or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns:

j : str

YAML string if str_or_buffer is not given.

urbansim.models.regression.fit_model(df, filters, model_expression)[source]¶

Use statsmodels OLS to construct a model relation.

Parameters:

df : pandas.DataFrame

Data to use for fit. Should contain all the columns referenced in the model_expression.

filters : list of str

Any filters to apply before doing the model fit.

model_expression : str

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.

Returns:

fit : statsmodels.regression.linear_model.OLSResults

urbansim.models.regression.predict(df, filters, model_fit, ytransform=None)[source]¶

Apply model to new data to predict new dependent values.

Parameters:

df : pandas.DataFrame

filters : list of str

Any filters to apply before doing prediction.

model_fit : statsmodels.regression.linear_model.OLSResults

Result of model estimation.

ytransform : callable, optional

A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass ytransform=np.exp so that the results reflect actual price.

By default no transformation is applied.

Returns:

result : pandas.Series

Predicted values as a pandas Series. Will have the index of df after applying filters.

Discrete Choice API Docs¶

Use the MNLDiscreteChoiceModel class to train a choice module using multinomial logit and make subsequent choice predictions.

class urbansim.models.dcm.DiscreteChoiceModel[source]¶: Abstract base class for discrete choice models.

class urbansim.models.dcm.MNLDiscreteChoiceModel(model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, name=None)[source]¶

A discrete choice model with the ability to store an estimated model and predict new data based on the model. Based on multinomial logit.

Parameters:

model_expression : str, iterable, or dict

A patsy model expression. Should contain only a right-hand side.

sample_size : int

Number of choices to sample for estimating the model.

probability_mode : str, optional

Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.

choice_mode : str, optional

Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.

choosers_fit_filters : list of str, optional

Filters applied to choosers table before fitting the model.

choosers_predict_filters : list of str, optional

Filters applied to the choosers table before calculating new data points.

alts_fit_filters : list of str, optional

Filters applied to the alternatives table before fitting the model.

alts_predict_filters : list of str, optional

Filters applied to the alternatives table before calculating new data points.

interaction_predict_filters : list of str, optional

Filters applied to the merged choosers/alternatives table before predicting agent choices.

estimation_sample_size : int, optional

Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters).

prediction_sample_size : int, optional

Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.

choice_column : optional

Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.

name : optional

Optional descriptive name for this model that may be used in output.

alts_columns_used()[source]¶: Columns from the alternatives table that are used for filtering.

apply_fit_filters(choosers, alternatives)[source]¶

Filter choosers and alternatives for fitting.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns:

filtered_choosers, filtered_alts : pandas.DataFrame

apply_predict_filters(choosers, alternatives)[source]¶

Filter choosers and alternatives for prediction.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns:

filtered_choosers, filtered_alts : pandas.DataFrame

assert_fitted()[source]¶: Raises RuntimeError if the model is not ready for prediction.

choosers_columns_used()[source]¶: Columns from the choosers table that are used for filtering.

columns_used()[source]¶: Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(choosers, alternatives, current_choice)[source]¶

Fit and save model parameters based on given data.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

current_choice : pandas.Series or any

A Series describing the alternatives currently chosen by the choosers. Should have an index matching choosers and values matching the index of alternatives.

If a non-Series is given it should be a column in choosers.

Returns:

log_likelihoods : dict

Dict of log-liklihood values describing the quality of the model fit. Will have keys ‘null’, ‘convergence’, and ‘ratio’.

classmethod fit_from_cfg(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]¶

Parameters:

choosers : DataFrame

A dataframe in which rows represent choosers.

chosen_fname : string

A string indicating the column in the choosers dataframe which gives which alternatives the choosers have chosen.

alternatives : DataFrame

A table of alternatives. It should include the choices from the choosers table as well as other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.

cfgname : string

The name of the yaml config file from which to read the discrete choice model.

outcfgname : string, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns:

lcm : MNLDiscreteChoiceModel which was used to fit

fitted¶: True if model is ready for prediction.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a DiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters:

yaml_str : str, optional

A YAML string from which to load model.

str_or_buffer : str or file like, optional

File name or buffer from which to load YAML.

Returns:

MNLDiscreteChoiceModel

interaction_columns_used()[source]¶: Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(choosers, alternatives, debug=False)[source]¶

Choose from among alternatives for a group of agents.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

debug : bool

If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns:

choices : pandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

classmethod predict_from_cfg(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]¶

Simulate choices for the specified choosers

Parameters:

choosers : DataFrame

A dataframe of agents doing the choosing.

alternatives : DataFrame

A dataframe of locations which the choosers are locating in and which have a supply.

cfgname : string

The name of the yaml config file from which to read the discrete choice model.

cfg: string

an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.

alternative_ratio : float, optional

Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).

debug : boolean, optional (default False)

Whether to generate debug information on the model.

Returns:

choices : pandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

lcm : MNLDiscreteChoiceModel which was used to predict

probabilities(choosers, alternatives, filter_tables=True)[source]¶

Returns the probabilities for a set of choosers to choose from among a set of alternatives.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

filter_tables : bool, optional

If True, filter choosers and alternatives with prediction filters before calculating probabilities.

Returns:

probabilities : pandas.Series

Probability of selection associated with each chooser and alternative. Index will be a MultiIndex with alternative IDs in the inner index and chooser IDs in the out index.

report_fit()[source]¶: Print a report of the fit results.

str_model_expression¶: Model expression as a string suitable for use with patsy/statsmodels.

summed_probabilities(choosers, alternatives)[source]¶

Calculate total probability associated with each alternative.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

Returns:

probs : pandas.Series

Total probability associated with each alternative.

to_dict()[source]¶: Return a dict respresentation of an MNLDiscreteChoiceModel instance.

to_yaml(str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters:

str_or_buffer : str or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns:

j : str

YAML is string if str_or_buffer is not given.

class urbansim.models.dcm.MNLDiscreteChoiceModelGroup(segmentation_col, remove_alts=False, name=None)[source]¶

Manages a group of discrete choice models that refer to different segments of choosers.

Model names must match the segment names after doing a pandas groupby.

Parameters:

segmentation_col : str

Name of a column in the table of choosers. Will be used to perform a pandas groupby on the choosers table.

remove_alts : bool, optional

Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.

name : str, optional

A name that may be used in places to identify this group.

add_model(model)[source]¶

Add an MNLDiscreteChoiceModel instance.

Parameters:

model : MNLDiscreteChoiceModel

Should have a .name attribute matching one of the segments in the choosers table.

add_model_from_params(name, model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None)[source]¶

Add a model by passing parameters through to MNLDiscreteChoiceModel.

Parameters:

name

Must match a segment in the choosers table.

model_expression : str, iterable, or dict

A patsy model expression. Should contain only a right-hand side.

sample_size : int

Number of choices to sample for estimating the model.

probability_mode : str, optional

Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives.

choice_mode : str or callable, optional

Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers.

choosers_fit_filters : list of str, optional

Filters applied to choosers table before fitting the model.

choosers_predict_filters : list of str, optional

Filters applied to the choosers table before calculating new data points.

alts_fit_filters : list of str, optional

Filters applied to the alternatives table before fitting the model.

alts_predict_filters : list of str, optional

Filters applied to the alternatives table before calculating new data points.

interaction_predict_filters : list of str, optional

Filters applied to the merged choosers/alternatives table before predicting agent choices.

estimation_sample_size : int, optional

Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)

prediction_sample_size : int, optional

Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.

choice_column : optional

Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.

alts_columns_used()[source]¶: Columns from the alternatives table that are used for filtering.

apply_fit_filters(choosers, alternatives)[source]¶

Filter choosers and alternatives for fitting. This is done by filtering each submodel and concatenating the results.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns:

filtered_choosers, filtered_alts : pandas.DataFrame

apply_predict_filters(choosers, alternatives)[source]¶

Filter choosers and alternatives for prediction. This is done by filtering each submodel and concatenating the results.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns:

filtered_choosers, filtered_alts : pandas.DataFrame

choosers_columns_used()[source]¶: Columns from the choosers table that are used for filtering.

columns_used()[source]¶: Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(choosers, alternatives, current_choice)[source]¶

Fit and save models based on given data after segmenting the choosers table.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

current_choice

Name of column in choosers that indicates which alternative they have currently chosen.

Returns:

log_likelihoods : dict of dict

Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.

fitted¶: Whether all models in the group have been fitted.

interaction_columns_used()[source]¶: Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(choosers, alternatives, debug=False)[source]¶

Choose from among alternatives for a group of agents after segmenting the choosers table.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

debug : bool

If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns:

choices : pandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

probabilities(choosers, alternatives)[source]¶

Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

Returns:

probabilties : dict of pandas.Series

summed_probabilities(choosers, alternatives)[source]¶

Returns the sum of probabilities for alternatives across all chooser segments.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

Returns:

probs : pandas.Series

Summed probabilities from each segment added together.

class urbansim.models.dcm.SegmentedMNLDiscreteChoiceModel(segmentation_col, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, default_model_expr=None, remove_alts=False, name=None)[source]¶

An MNL LCM group that allows segments to have different model expressions but otherwise share configurations.

Parameters:

segmentation_col

Name of column in the choosers table that will be used for groupby.

sample_size : int

Number of choices to sample for estimating the model.

probability_mode : str, optional

Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.

choice_mode : str, optional

Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.

choosers_fit_filters : list of str, optional

Filters applied to choosers table before fitting the model.

choosers_predict_filters : list of str, optional

Filters applied to the choosers table before calculating new data points.

alts_fit_filters : list of str, optional

Filters applied to the alternatives table before fitting the model.

alts_predict_filters : list of str, optional

Filters applied to the alternatives table before calculating new data points.

interaction_predict_filters : list of str, optional

Filters applied to the merged choosers/alternatives table before predicting agent choices.

estimation_sample_size : int, optional

Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)

prediction_sample_size : int, optional

Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.

choice_column : optional

Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.

default_model_expr : str, iterable, or dict, optional

A patsy model expression. Should contain only a right-hand side.

remove_alts : bool, optional

Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.

name : str, optional

An optional string used to identify the model in places.

add_segment(name, model_expression=None)[source]¶

Add a new segment with its own model expression.

Parameters:

name

Segment name. Must match a segment in the groupby of the data.

model_expression : str or dict, optional

A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.

alts_columns_used()[source]¶: Columns from the alternatives table that are used for filtering.

apply_fit_filters(choosers, alternatives)[source]¶

Filter choosers and alternatives for fitting.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns:

filtered_choosers, filtered_alts : pandas.DataFrame

apply_predict_filters(choosers, alternatives)[source]¶

Filter choosers and alternatives for prediction.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

Returns:

filtered_choosers, filtered_alts : pandas.DataFrame

choosers_columns_used()[source]¶: Columns from the choosers table that are used for filtering.

columns_used()[source]¶: Columns from any table used in the model. May come from either the choosers or alternatives tables.

fit(choosers, alternatives, current_choice)[source]¶

Fit and save models based on given data after segmenting the choosers table. Segments that have not already been explicitly added will be automatically added with default model.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing, e.g. buildings.

current_choice

Name of column in choosers that indicates which alternative they have currently chosen.

Returns:

log_likelihoods : dict of dict

Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.

classmethod fit_from_cfg(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]¶

Parameters:

choosers : DataFrame

A dataframe of rows of agents that have made choices.

chosen_fname : string

A string indicating the column in the choosers dataframe which gives which alternative the choosers have chosen.

alternatives : DataFrame

A dataframe of alternatives. It should include the current choices from the choosers dataframe as well as some other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.

cfgname : string

The name of the yaml config file from which to read the discrete choice model.

outcfgname : string, optional (default cfgname)

The name of the output yaml config file where estimation results are written into.

Returns:

lcm : SegmentedMNLDiscreteChoiceModel which was used to fit

fitted¶: Whether models for all segments have been fit.

classmethod from_yaml(yaml_str=None, str_or_buffer=None)[source]¶

Create a SegmentedMNLDiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.

Parameters:

yaml_str : str, optional

A YAML string from which to load model.

str_or_buffer : str or file like, optional

File name or buffer from which to load YAML.

Returns:

SegmentedMNLDiscreteChoiceModel

interaction_columns_used()[source]¶: Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.

predict(choosers, alternatives, debug=False)[source]¶

Choose from among alternatives for a group of agents after segmenting the choosers table.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

debug : bool

If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.

Returns:

choices : pandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

classmethod predict_from_cfg(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]¶

Simulate the discrete choices for the specified choosers

Parameters:

choosers : DataFrame

A dataframe of agents doing the choosing.

alternatives : DataFrame

A dataframe of alternatives which the choosers are locating in and which have a supply.

cfgname : string

The name of the yaml config file from which to read the discrete choice model.

cfg: string

an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.

alternative_ratio : float

Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).

Returns:

choices : pandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.

lcm : SegmentedMNLDiscreteChoiceModel which was used to predict

probabilities(choosers, alternatives)[source]¶

Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

Returns:

probabilties : dict of pandas.Series

summed_probabilities(choosers, alternatives)[source]¶

Returns the sum of probabilities for alternatives across all chooser segments.

Parameters:

choosers : pandas.DataFrame

Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.

alternatives : pandas.DataFrame

Table describing the things from which agents are choosing.

Returns:

probs : pandas.Series

Summed probabilities from each segment added together.

to_dict()[source]¶: Returns a dict representation of this instance suitable for conversion to YAML.

to_yaml(str_or_buffer=None)[source]¶

Save a model respresentation to YAML.

Parameters:

str_or_buffer : str or file like, optional

By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a .write method is given the YAML will be written to that object.

Returns:

j : str

YAML is string if str_or_buffer is not given.

urbansim.models.dcm.unit_choice(chooser_ids, alternative_ids, probabilities)[source]¶

Have a set of choosers choose from among alternatives according to a probability distribution. Choice is binary: each alternative can only be chosen once.

Parameters:

chooser_ids : 1d array_like

Array of IDs of the agents that are making choices.

alternative_ids : 1d array_like

Array of IDs of alternatives among which agents are making choices.

probabilities : 1d array_like

The probability that an agent will choose an alternative. Must be the same shape as alternative_ids. Unavailable alternatives should have a probability of 0.

Returns:

choices : pandas.Series

Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.