Statistical Models¶
Introduction¶
UrbanSim has two sets of statistical models: regressions and discrete choice models. Each has a three stage usage pattern:
Create a configured model instance. This is where you will supply most of the information to the model such as the actual definition of the model and any filters that restrict the data used during fitting and prediction.
Fit the model by supplying base year data.
Make predictions based on new data.
Model Expressions¶
Statistical models require specification of a “model expression” that describes the model as a mathematical formula. UrbanSim uses patsy to interpret model expressions, but UrbanSim gives you some flexibility as to how you define them.
patsy works with string formula like this simplified regression example (names refer to columns in the DataFrames used during fitting and prediction):
expr = 'np.log1p(sqft_price) ~ I(year_built < 1940) + dist_hwy + ave_income'
In UrbanSim that same formula could be expressed in a dictionary:
expr = {
'left_side': 'np.log1p(sqft_price)',
'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}
Formulae used with location choice models have only a right hand side since the models do not predict new numeric values. Right-hand-side formulae can be written as lists or dictionaries:
expr = {
'right_side': ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
}
expr = ['I(year_built < 1940)', 'dist_hwy', 'ave_income']
Expressing the formula as a string is always an option. The ability to use lists or dictionaries are especially useful to make attractively formatted formulae in YAML config files.
YAML Persistence¶
UrbanSim’s regression and location choice models can be saved as YAML files and loaded again at another time. This feature is especially useful for estimating models in one location, saving the fit parameters to disk, and then using the fitted model for prediction somewhere else.
Use the .to_yaml
and .from_yaml
methods to save files to disk
and load them back as configured models.
Here’s an example of loading a regression model, performing fitting, and
saving the model back to YAML:
model = RegressionModel.from_yaml('my_model.yaml')
model.fit(data)
model.to_yaml('my_model.yaml')
You can, if you like, write your model configurations entirely in YAML and load them into Python only for fitting and prediction.
API¶
Regression API¶
|
A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model. |
|
A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters. |
|
Manages a group of regression models that refer to different segments within a single table. |
Discrete Choice API¶
|
A discrete choice model with the ability to store an estimated model and predict new data based on the model. |
|
An MNL LCM group that allows segments to have different model expressions but otherwise share configurations. |
|
Manages a group of discrete choice models that refer to different segments of choosers. |
Regression API Docs¶
Use the RegressionModel
class to fit a model using statsmodels’
OLS capability and then do subsequent prediction.
-
class
urbansim.models.regression.
RegressionModel
(fit_filters, predict_filters, model_expression, ytransform=None, name=None)[source]¶ A hedonic (regression) model with the ability to store an estimated model and predict new data based on the model.
statsmodels’ OLS implementation is used.
- Parameters
- fit_filterslist of str
Filters applied before fitting the model.
- predict_filterslist of str
Filters applied before calculating new data points.
- model_expressionstr or dict
A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.
- ytransformcallable, optional
A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass
ytransform=np.exp
so that the results reflect actual price.By default no transformation is applied.
- nameoptional
Optional descriptive name for this model that may be used in output.
-
columns_used
(self)[source]¶ Returns all the columns used in this model for filtering and in the model expression.
-
fit
(self, data, debug=False)[source]¶ Fit the model to data and store/return the results.
- Parameters
- datapandas.DataFrame
Data to use for fitting the model. Must contain all the columns referenced by the model_expression.
- debugbool
If debug is set to true, this sets the attribute “est_data” to a dataframe with the actual data used for estimation of this model.
- Returns
- fitstatsmodels.regression.linear_model.OLSResults
This is returned for inspection, but also stored on the class instance for use during prediction.
-
classmethod
fit_from_cfg
(df, cfgname, debug=False, outcfgname=None)[source]¶ - Parameters
- dfDataFrame
The dataframe which contains the columns to use for the estimation.
- cfgnamestring
The name of the yaml config file which describes the hedonic model.
- debugboolean, optional (default False)
Whether to generate debug information on the model.
- outcfgnamestring, optional (default cfgname)
The name of the output yaml config file where estimation results are written into.
- Returns
- RegressionModel which was used to fit
-
property
fitted
¶ True if the model is ready for prediction.
-
classmethod
from_yaml
(yaml_str=None, str_or_buffer=None)[source]¶ Create a RegressionModel instance from a saved YAML configuration. Arguments are mutually exclusive.
- Parameters
- yaml_strstr, optional
A YAML string from which to load model.
- str_or_bufferstr or file like, optional
File name or buffer from which to load YAML.
- Returns
- RegressionModel
-
predict
(self, data)[source]¶ Predict a new data set based on an estimated model.
- Parameters
- datapandas.DataFrame
Data to use for prediction. Must contain all the columns referenced by the right-hand side of the model_expression.
- Returns
- resultpandas.Series
Predicted values as a pandas Series. Will have the index of data after applying filters.
-
classmethod
predict_from_cfg
(df, cfgname)[source]¶ - Parameters
- dfDataFrame
The dataframe which contains the columns to use for the estimation.
- cfgnamestring
The name of the yaml config file which describes the hedonic model.
- Returns
- predictedpandas.Series
Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.
- hmRegressionModel which was used to predict
-
property
str_model_expression
¶ Model expression as a string suitable for use with patsy/statsmodels.
-
to_yaml
(self, str_or_buffer=None)[source]¶ Save a model respresentation to YAML.
- Parameters
- str_or_bufferstr or file like, optional
By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a
.write
method is given the YAML will be written to that object.
- Returns
- jstr
YAML string if str_or_buffer is not given.
-
class
urbansim.models.regression.
RegressionModelGroup
(segmentation_col, name=None)[source]¶ Manages a group of regression models that refer to different segments within a single table.
Model names must match the segment names after doing a Pandas groupby.
- Parameters
- segmentation_col
Name of the column on which to segment.
- name
Optional name used to identify the model in places.
-
add_model
(self, model)[source]¶ Add a RegressionModel instance.
- Parameters
- modelRegressionModel
Should have a
.name
attribute matching one of the groupby segments.
-
add_model_from_params
(self, name, fit_filters, predict_filters, model_expression, ytransform=None)[source]¶ Add a model by passing arguments through to RegressionModel.
- Parameters
- nameany
Must match a groupby segment name.
- fit_filterslist of str
Filters applied before fitting the model.
- predict_filterslist of str
Filters applied before calculating new data points.
- model_expressionstr
A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.
- ytransformcallable, optional
A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass
ytransform=np.exp
so that the results reflect actual price.By default no transformation is applied.
-
columns_used
(self)[source]¶ Returns all the columns used across all models in the group for filtering and in the model expression.
-
fit
(self, data, debug=False)[source]¶ Fit each of the models in the group.
- Parameters
- datapandas.DataFrame
Must have a column with the same name as segmentation_col.
- debugbool
If set to true (default false) will pass the debug parameter to model estimation.
- Returns
- fitsdict of statsmodels.regression.linear_model.OLSResults
Keys are the segment names.
-
property
fitted
¶ Whether all models in the group have been fitted.
-
predict
(self, data)[source]¶ Predict new data for each group in the segmentation.
- Parameters
- datapandas.DataFrame
Data to use for prediction. Must have a column with the same name as segmentation_col.
- Returns
- predictedpandas.Series
Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.
-
class
urbansim.models.regression.
SegmentedRegressionModel
(segmentation_col, fit_filters=None, predict_filters=None, default_model_expr=None, default_ytransform=None, min_segment_size=0, name=None)[source]¶ A regression model group that allows segments to have different model expressions and ytransforms but all have the same filters.
- Parameters
- segmentation_col
Name of column in the data table on which to segment. Will be used with a pandas groupby on the data table.
- fit_filterslist of str, optional
Filters applied before fitting the model.
- predict_filterslist of str, optional
Filters applied before calculating new data points.
- min_segment_sizeint
This model will add all segments that have at least this number of observations. A very small number of observations (e.g. 1) will cause an error with estimation.
- default_model_exprstr or dict, optional
A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.
- default_ytransformcallable, optional
A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass
ytransform=np.exp
so that the results reflect actual price.By default no transformation is applied.
- min_segment_sizeint, optional
Segments with less than this many members will be skipped.
- namestr, optional
A name used in places to identify the model.
-
add_segment
(self, name, model_expression=None, ytransform='default')[source]¶ Add a new segment with its own model expression and ytransform.
- Parameters
- name :
Segment name. Must match a segment in the groupby of the data.
- model_expressionstr or dict, optional
A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.
- ytransformcallable, optional
A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass
ytransform=np.exp
so that the results reflect actual price.If not given the default ytransform will be used.
-
columns_used
(self)[source]¶ Returns all the columns used across all models in the group for filtering and in the model expression.
-
fit
(self, data, debug=False)[source]¶ Fit each segment. Segments that have not already been explicitly added will be automatically added with default model and ytransform.
- Parameters
- datapandas.DataFrame
Must have a column with the same name as segmentation_col.
- debugbool
If set to true will pass debug to the fit method of each model.
- Returns
- fitsdict of statsmodels.regression.linear_model.OLSResults
Keys are the segment names.
-
classmethod
fit_from_cfg
(df, cfgname, debug=False, min_segment_size=None, outcfgname=None)[source]¶ - Parameters
- dfDataFrame
The dataframe which contains the columns to use for the estimation.
- cfgnamestring
The name of the yaml config file which describes the hedonic model.
- debugboolean, optional (default False)
Whether to generate debug information on the model.
- min_segment_sizeint, optional
Set attribute on the model.
- outcfgnamestring, optional (default cfgname)
The name of the output yaml config file where estimation results are written into.
- Returns
- hmSegmentedRegressionModel which was used to fit
-
property
fitted
¶ Whether models for all segments have been fit.
-
classmethod
from_yaml
(yaml_str=None, str_or_buffer=None)[source]¶ Create a SegmentedRegressionModel instance from a saved YAML configuration. Arguments are mutally exclusive.
- Parameters
- yaml_strstr, optional
A YAML string from which to load model.
- str_or_bufferstr or file like, optional
File name or buffer from which to load YAML.
- Returns
- SegmentedRegressionModel
-
predict
(self, data)[source]¶ Predict new data for each group in the segmentation.
- Parameters
- datapandas.DataFrame
Data to use for prediction. Must have a column with the same name as segmentation_col.
- Returns
- predictedpandas.Series
Predicted data in a pandas Series. Will have the index of data after applying filters.
-
classmethod
predict_from_cfg
(df, cfgname, min_segment_size=None)[source]¶ - Parameters
- dfDataFrame
The dataframe which contains the columns to use for the estimation.
- cfgnamestring
The name of the yaml config file which describes the hedonic model.
- min_segment_sizeint, optional
Set attribute on the model.
- Returns
- predictedpandas.Series
Predicted data in a pandas Series. Will have the index of data after applying filters and minus any groups that do not have models.
- hmSegmentedRegressionModel which was used to predict
-
to_dict
(self)[source]¶ Returns a dict representation of this instance suitable for conversion to YAML.
-
to_yaml
(self, str_or_buffer=None)[source]¶ Save a model respresentation to YAML.
- Parameters
- str_or_bufferstr or file like, optional
By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a
.write
method is given the YAML will be written to that object.
- Returns
- jstr
YAML string if str_or_buffer is not given.
-
urbansim.models.regression.
fit_model
(df, filters, model_expression)[source]¶ Use statsmodels OLS to construct a model relation.
- Parameters
- dfpandas.DataFrame
Data to use for fit. Should contain all the columns referenced in the model_expression.
- filterslist of str
Any filters to apply before doing the model fit.
- model_expressionstr
A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides.
- Returns
- fitstatsmodels.regression.linear_model.OLSResults
-
urbansim.models.regression.
predict
(df, filters, model_fit, ytransform=None)[source]¶ Apply model to new data to predict new dependent values.
- Parameters
- dfpandas.DataFrame
- filterslist of str
Any filters to apply before doing prediction.
- model_fitstatsmodels.regression.linear_model.OLSResults
Result of model estimation.
- ytransformcallable, optional
A function to call on the array of predicted output. For example, if the model relation is predicting the log of price, you might pass
ytransform=np.exp
so that the results reflect actual price.By default no transformation is applied.
- Returns
- resultpandas.Series
Predicted values as a pandas Series. Will have the index of df after applying filters.
Discrete Choice API Docs¶
Use the MNLDiscreteChoiceModel
class to train a choice module using
multinomial logit and make subsequent choice predictions.
-
class
urbansim.models.dcm.
DiscreteChoiceModel
[source]¶ Abstract base class for discrete choice models.
-
class
urbansim.models.dcm.
MNLDiscreteChoiceModel
(model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, name=None)[source]¶ A discrete choice model with the ability to store an estimated model and predict new data based on the model. Based on multinomial logit.
- Parameters
- model_expressionstr, iterable, or dict
A patsy model expression. Should contain only a right-hand side.
- sample_sizeint
Number of choices to sample for estimating the model.
- probability_modestr, optional
Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.
- choice_modestr, optional
Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.
- choosers_fit_filterslist of str, optional
Filters applied to choosers table before fitting the model.
- choosers_predict_filterslist of str, optional
Filters applied to the choosers table before calculating new data points.
- alts_fit_filterslist of str, optional
Filters applied to the alternatives table before fitting the model.
- alts_predict_filterslist of str, optional
Filters applied to the alternatives table before calculating new data points.
- interaction_predict_filterslist of str, optional
Filters applied to the merged choosers/alternatives table before predicting agent choices.
- estimation_sample_sizeint, optional
Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters).
- prediction_sample_sizeint, optional
Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.
- choice_columnoptional
Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.
- nameoptional
Optional descriptive name for this model that may be used in output.
-
apply_fit_filters
(self, choosers, alternatives)[source]¶ Filter choosers and alternatives for fitting.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- Returns
- filtered_choosers, filtered_altspandas.DataFrame
-
apply_predict_filters
(self, choosers, alternatives)[source]¶ Filter choosers and alternatives for prediction.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- Returns
- filtered_choosers, filtered_altspandas.DataFrame
-
columns_used
(self)[source]¶ Columns from any table used in the model. May come from either the choosers or alternatives tables.
-
fit
(self, choosers, alternatives, current_choice)[source]¶ Fit and save model parameters based on given data.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- current_choicepandas.Series or any
A Series describing the alternatives currently chosen by the choosers. Should have an index matching choosers and values matching the index of alternatives.
If a non-Series is given it should be a column in choosers.
- Returns
- log_likelihoodsdict
Dict of log-liklihood values describing the quality of the model fit. Will have keys ‘null’, ‘convergence’, and ‘ratio’.
-
classmethod
fit_from_cfg
(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]¶ - Parameters
- choosersDataFrame
A dataframe in which rows represent choosers.
- chosen_fnamestring
A string indicating the column in the choosers dataframe which gives which alternatives the choosers have chosen.
- alternativesDataFrame
A table of alternatives. It should include the choices from the choosers table as well as other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.
- cfgnamestring
The name of the yaml config file from which to read the discrete choice model.
- outcfgnamestring, optional (default cfgname)
The name of the output yaml config file where estimation results are written into.
- Returns
- lcmMNLDiscreteChoiceModel which was used to fit
-
property
fitted
¶ True if model is ready for prediction.
-
classmethod
from_yaml
(yaml_str=None, str_or_buffer=None)[source]¶ Create a DiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.
- Parameters
- yaml_strstr, optional
A YAML string from which to load model.
- str_or_bufferstr or file like, optional
File name or buffer from which to load YAML.
- Returns
- MNLDiscreteChoiceModel
-
interaction_columns_used
(self)[source]¶ Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.
-
predict
(self, choosers, alternatives, debug=False)[source]¶ Choose from among alternatives for a group of agents.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- debugbool
If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.
- Returns
- choicespandas.Series
Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.
-
classmethod
predict_from_cfg
(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]¶ Simulate choices for the specified choosers
- Parameters
- choosersDataFrame
A dataframe of agents doing the choosing.
- alternativesDataFrame
A dataframe of locations which the choosers are locating in and which have a supply.
- cfgnamestring
The name of the yaml config file from which to read the discrete choice model.
- cfg: string
an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.
- alternative_ratiofloat, optional
Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).
- debugboolean, optional (default False)
Whether to generate debug information on the model.
- Returns
- choicespandas.Series
Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.
- lcmMNLDiscreteChoiceModel which was used to predict
-
probabilities
(self, choosers, alternatives, filter_tables=True)[source]¶ Returns the probabilities for a set of choosers to choose from among a set of alternatives.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- filter_tablesbool, optional
If True, filter choosers and alternatives with prediction filters before calculating probabilities.
- Returns
- probabilitiespandas.Series
Probability of selection associated with each chooser and alternative. Index will be a MultiIndex with alternative IDs in the inner index and chooser IDs in the out index.
-
property
str_model_expression
¶ Model expression as a string suitable for use with patsy/statsmodels.
-
summed_probabilities
(self, choosers, alternatives)[source]¶ Calculate total probability associated with each alternative.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- Returns
- probspandas.Series
Total probability associated with each alternative.
-
to_yaml
(self, str_or_buffer=None)[source]¶ Save a model respresentation to YAML.
- Parameters
- str_or_bufferstr or file like, optional
By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a
.write
method is given the YAML will be written to that object.
- Returns
- jstr
YAML is string if str_or_buffer is not given.
-
class
urbansim.models.dcm.
MNLDiscreteChoiceModelGroup
(segmentation_col, remove_alts=False, name=None)[source]¶ Manages a group of discrete choice models that refer to different segments of choosers.
Model names must match the segment names after doing a pandas groupby.
- Parameters
- segmentation_colstr
Name of a column in the table of choosers. Will be used to perform a pandas groupby on the choosers table.
- remove_altsbool, optional
Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.
- namestr, optional
A name that may be used in places to identify this group.
-
add_model
(self, model)[source]¶ Add an MNLDiscreteChoiceModel instance.
- Parameters
- modelMNLDiscreteChoiceModel
Should have a
.name
attribute matching one of the segments in the choosers table.
-
add_model_from_params
(self, name, model_expression, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None)[source]¶ Add a model by passing parameters through to MNLDiscreteChoiceModel.
- Parameters
- name
Must match a segment in the choosers table.
- model_expressionstr, iterable, or dict
A patsy model expression. Should contain only a right-hand side.
- sample_sizeint
Number of choices to sample for estimating the model.
- probability_modestr, optional
Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives.
- choice_modestr or callable, optional
Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers.
- choosers_fit_filterslist of str, optional
Filters applied to choosers table before fitting the model.
- choosers_predict_filterslist of str, optional
Filters applied to the choosers table before calculating new data points.
- alts_fit_filterslist of str, optional
Filters applied to the alternatives table before fitting the model.
- alts_predict_filterslist of str, optional
Filters applied to the alternatives table before calculating new data points.
- interaction_predict_filterslist of str, optional
Filters applied to the merged choosers/alternatives table before predicting agent choices.
- estimation_sample_sizeint, optional
Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)
- prediction_sample_sizeint, optional
Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.
- choice_columnoptional
Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.
-
apply_fit_filters
(self, choosers, alternatives)[source]¶ Filter choosers and alternatives for fitting. This is done by filtering each submodel and concatenating the results.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- Returns
- filtered_choosers, filtered_altspandas.DataFrame
-
apply_predict_filters
(self, choosers, alternatives)[source]¶ Filter choosers and alternatives for prediction. This is done by filtering each submodel and concatenating the results.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- Returns
- filtered_choosers, filtered_altspandas.DataFrame
-
columns_used
(self)[source]¶ Columns from any table used in the model. May come from either the choosers or alternatives tables.
-
fit
(self, choosers, alternatives, current_choice)[source]¶ Fit and save models based on given data after segmenting the choosers table.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- current_choice
Name of column in choosers that indicates which alternative they have currently chosen.
- Returns
- log_likelihoodsdict of dict
Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.
-
property
fitted
¶ Whether all models in the group have been fitted.
-
interaction_columns_used
(self)[source]¶ Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.
-
predict
(self, choosers, alternatives, debug=False)[source]¶ Choose from among alternatives for a group of agents after segmenting the choosers table.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- debugbool
If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.
- Returns
- choicespandas.Series
Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.
-
probabilities
(self, choosers, alternatives)[source]¶ Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- Returns
- probabiltiesdict of pandas.Series
-
summed_probabilities
(self, choosers, alternatives)[source]¶ Returns the sum of probabilities for alternatives across all chooser segments.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- Returns
- probspandas.Series
Summed probabilities from each segment added together.
-
class
urbansim.models.dcm.
SegmentedMNLDiscreteChoiceModel
(segmentation_col, sample_size, probability_mode='full_product', choice_mode='individual', choosers_fit_filters=None, choosers_predict_filters=None, alts_fit_filters=None, alts_predict_filters=None, interaction_predict_filters=None, estimation_sample_size=None, prediction_sample_size=None, choice_column=None, default_model_expr=None, remove_alts=False, name=None)[source]¶ An MNL LCM group that allows segments to have different model expressions but otherwise share configurations.
- Parameters
- segmentation_col
Name of column in the choosers table that will be used for groupby.
- sample_sizeint
Number of choices to sample for estimating the model.
- probability_modestr, optional
Specify the method to use for calculating probabilities during prediction. Available string options are ‘single_chooser’ and ‘full_product’. In “single chooser” mode one agent is chosen for calculating probabilities across all alternatives. In “full product” mode probabilities are calculated for every chooser across all alternatives. Currently “single chooser” mode must be used with a choice_mode of ‘aggregate’ and “full product” mode must be used with a choice_mode of ‘individual’.
- choice_modestr, optional
Specify the method to use for making choices among alternatives. Available string options are ‘individual’ and ‘aggregate’. In “individual” mode choices will be made separately for each chooser. In “aggregate” mode choices are made for all choosers at once. Aggregate mode implies that an alternative chosen by one agent is unavailable to other agents and that the same probabilities can be used for all choosers. Currently “individual” mode must be used with a probability_mode of ‘full_product’ and “aggregate” mode must be used with a probability_mode of ‘single_chooser’.
- choosers_fit_filterslist of str, optional
Filters applied to choosers table before fitting the model.
- choosers_predict_filterslist of str, optional
Filters applied to the choosers table before calculating new data points.
- alts_fit_filterslist of str, optional
Filters applied to the alternatives table before fitting the model.
- alts_predict_filterslist of str, optional
Filters applied to the alternatives table before calculating new data points.
- interaction_predict_filterslist of str, optional
Filters applied to the merged choosers/alternatives table before predicting agent choices.
- estimation_sample_sizeint, optional
Whether to sample choosers during estimation (needs to be applied after choosers_fit_filters)
- prediction_sample_sizeint, optional
Whether (and how much) to sample alternatives during prediction. Note that this can lead to multiple choosers picking the same alternative.
- choice_columnoptional
Name of the column in the alternatives table that choosers should choose. e.g. the ‘building_id’ column. If not provided the alternatives index is used.
- default_model_exprstr, iterable, or dict, optional
A patsy model expression. Should contain only a right-hand side.
- remove_altsbool, optional
Specify how to handle alternatives between prediction for different models. If False, the alternatives table is not modified between predictions. If True, alternatives that have been chosen are removed from the alternatives table before doing another round of prediction.
- namestr, optional
An optional string used to identify the model in places.
-
add_segment
(self, name, model_expression=None)[source]¶ Add a new segment with its own model expression.
- Parameters
- name
Segment name. Must match a segment in the groupby of the data.
- model_expressionstr or dict, optional
A patsy model expression that can be used with statsmodels. Should contain both the left- and right-hand sides. If not given the default model will be used, which must not be None.
-
apply_fit_filters
(self, choosers, alternatives)[source]¶ Filter choosers and alternatives for fitting.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- Returns
- filtered_choosers, filtered_altspandas.DataFrame
-
apply_predict_filters
(self, choosers, alternatives)[source]¶ Filter choosers and alternatives for prediction.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- Returns
- filtered_choosers, filtered_altspandas.DataFrame
-
columns_used
(self)[source]¶ Columns from any table used in the model. May come from either the choosers or alternatives tables.
-
fit
(self, choosers, alternatives, current_choice)[source]¶ Fit and save models based on given data after segmenting the choosers table. Segments that have not already been explicitly added will be automatically added with default model.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column with the same name as the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing, e.g. buildings.
- current_choice
Name of column in choosers that indicates which alternative they have currently chosen.
- Returns
- log_likelihoodsdict of dict
Keys will be model names and values will be dictionaries of log-liklihood values as returned by MNLDiscreteChoiceModel.fit.
-
classmethod
fit_from_cfg
(choosers, chosen_fname, alternatives, cfgname, outcfgname=None)[source]¶ - Parameters
- choosersDataFrame
A dataframe of rows of agents that have made choices.
- chosen_fnamestring
A string indicating the column in the choosers dataframe which gives which alternative the choosers have chosen.
- alternativesDataFrame
A dataframe of alternatives. It should include the current choices from the choosers dataframe as well as some other alternatives from which to sample. Values in choosers[chosen_fname] should index into the alternatives dataframe.
- cfgnamestring
The name of the yaml config file from which to read the discrete choice model.
- outcfgnamestring, optional (default cfgname)
The name of the output yaml config file where estimation results are written into.
- Returns
- lcmSegmentedMNLDiscreteChoiceModel which was used to fit
-
property
fitted
¶ Whether models for all segments have been fit.
-
classmethod
from_yaml
(yaml_str=None, str_or_buffer=None)[source]¶ Create a SegmentedMNLDiscreteChoiceModel instance from a saved YAML configuration. Arguments are mutally exclusive.
- Parameters
- yaml_strstr, optional
A YAML string from which to load model.
- str_or_bufferstr or file like, optional
File name or buffer from which to load YAML.
- Returns
- SegmentedMNLDiscreteChoiceModel
-
interaction_columns_used
(self)[source]¶ Columns from the interaction dataset used for filtering and in the model. These may come originally from either the choosers or alternatives tables.
-
predict
(self, choosers, alternatives, debug=False)[source]¶ Choose from among alternatives for a group of agents after segmenting the choosers table.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- debugbool
If debug is set to true, will set the variable “sim_pdf” on the object to store the probabilities for mapping of the outcome.
- Returns
- choicespandas.Series
Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.
-
classmethod
predict_from_cfg
(choosers, alternatives, cfgname=None, cfg=None, alternative_ratio=2.0, debug=False)[source]¶ Simulate the discrete choices for the specified choosers
- Parameters
- choosersDataFrame
A dataframe of agents doing the choosing.
- alternativesDataFrame
A dataframe of alternatives which the choosers are locating in and which have a supply.
- cfgnamestring
The name of the yaml config file from which to read the discrete choice model.
- cfg: string
an ordered yaml string of the model discrete choice model configuration. Used to read config from memory in lieu of loading cfgname from disk.
- alternative_ratiofloat
Above the ratio of alternatives to choosers (default of 2.0), the alternatives will be sampled to meet this ratio (for performance reasons).
- Returns
- choicespandas.Series
Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.
- lcmSegmentedMNLDiscreteChoiceModel which was used to predict
-
probabilities
(self, choosers, alternatives)[source]¶ Returns alternative probabilties for each chooser segment as a dictionary keyed by segment name.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- Returns
- probabiltiesdict of pandas.Series
-
summed_probabilities
(self, choosers, alternatives)[source]¶ Returns the sum of probabilities for alternatives across all chooser segments.
- Parameters
- chooserspandas.DataFrame
Table describing the agents making choices, e.g. households. Must have a column matching the .segmentation_col attribute.
- alternativespandas.DataFrame
Table describing the things from which agents are choosing.
- Returns
- probspandas.Series
Summed probabilities from each segment added together.
-
to_dict
(self)[source]¶ Returns a dict representation of this instance suitable for conversion to YAML.
-
to_yaml
(self, str_or_buffer=None)[source]¶ Save a model respresentation to YAML.
- Parameters
- str_or_bufferstr or file like, optional
By default a YAML string is returned. If a string is given here the YAML will be written to that file. If an object with a
.write
method is given the YAML will be written to that object.
- Returns
- jstr
YAML is string if str_or_buffer is not given.
-
urbansim.models.dcm.
unit_choice
(chooser_ids, alternative_ids, probabilities)[source]¶ Have a set of choosers choose from among alternatives according to a probability distribution. Choice is binary: each alternative can only be chosen once.
- Parameters
- chooser_ids1d array_like
Array of IDs of the agents that are making choices.
- alternative_ids1d array_like
Array of IDs of alternatives among which agents are making choices.
- probabilities1d array_like
The probability that an agent will choose an alternative. Must be the same shape as alternative_ids. Unavailable alternatives should have a probability of 0.
- Returns
- choicespandas.Series
Mapping of chooser ID to alternative ID. Some choosers will map to a nan value when there are not enough alternatives for all the choosers.