Simulation utilities API

ChoiceModels provides general-purpose tools for Monte Carlo simulation of choices among alternatives, given probability distributions generated from fitted models.

monte_carlo_choices() is equivalent to applying np.random.choice() in parallel for many independent choice scenarios, but it’s implemented as a single-pass matrix calculation that is much faster.

iterative_lottery_choices() is for cases where the alternatives have limited capacitiesxs, requiring multiple passes to match choosers and alternatives. Effectively, choices are simulated sequentially, each time removing the chosen alternative or reducing its available capacity. (It’s actually done in batches for better performance.)

parallel_lottery_choices() works functionally the same as the above but the batches run in parallel rather than sequentially.

Independent choices

choicemodels.tools.monte_carlo_choices(probabilities)[source]

Monte Carlo simulation of choices for a set of K scenarios, each having different probability distributions (and potentially different alternatives).

Choices are independent and unconstrained, meaning that the same alternative can be chosen in multiple scenarios.

This function is equivalent to applying np.random.choice() to each of the K scenarios, but it’s implemented as a single-pass matrix calculation. When the number of scenarios is large, this is about 50x faster than using df.apply() or a loop.

If all the choice scenarios have the same probability distribution among alternatives, you don’t need this function. You can use np.random.choice() with size=K, which will be more efficient. (For example, that would work for a choice model whose expression includes only attributes of the alternatives.)

NOTE ABOUT THE INPUT FORMATS: It’s important for the probabilities to be structured correctly. This is computationally expensive to verify, so you will not get a warning if it’s wrong! (TO DO: we should provide an option to perform these checks, though)

  1. Probabilities (pd.Series) must include a two-level MultiIndex, the first level representing the scenario (observation) id and the second the alternative id.

  2. Probabilities must be sorted so that each scenario’s alternatives are consecutive.

  3. Each scenario must have the same number of alternatives. You can pad a scenario with zero-probability alternatives if needed.

  4. Each scenario’s alternative probabilities must sum to 1.

Parameters

probabilities (pd.Series) – List of probabilities for each observation (choice scenario) and alternative. Please verify that the formatting matches the four requirements described above.

Returns

List of chosen alternative id’s, indexed with the observation id.

Return type

pd.Series

Capacity-constrained choices

choicemodels.tools.iterative_lottery_choices(choosers, alternatives, mct_callable, probs_callable, alt_capacity=None, chooser_size=None, max_iter=None, chooser_batch_size=None)[source]

Monte Carlo simulation of choices for a set of choice scenarios where (a) the alternatives have limited capacity and (b) the choosers have varying probability distributions over the alternatives.

Effectively, we simulate the choices sequentially, each time removing the chosen alternative or reducing its available capacity. (It’s actually done in batches for better performance, but the outcome is equivalent.) This requires sampling alternatives and calculating choice probabilities multiple times, which is why callables for those actions are required inputs.

Chooser priority is randomized. Capacities can be specified as counts (number of choosers that can be accommodated) or as amounts (e.g. square footage) with corresponding chooser sizes. If total capacity is insufficient to accommodate all the choosers, as many choices will be simulated as possible.

Note that if all the choosers are the same size and have the same probability distribution over alternatives, you don’t need this function. You can use np.random.choice() with size=K to draw chosen alternatives, which will be more efficient. (This function also works, though.)

Parameters
  • choosers (pd.DataFrame) – Table with one row for each chooser or choice scenario, with unique ID’s in the index field. Additional columns can contain fixed attributes of the choosers. (Reserved column names: ‘_size’.)

  • alternatives (pd.DataFrame) – Table with one row for each alternative, with unique ID’s in the index field. Additional columns can contain fixed attributes of the alternatives. (Reserved column names: ‘_capacity’.)

  • mct_callable (callable) – Callable that samples alternatives to generate a table of choice scenarios. It should accept subsets of the choosers and alternatives tables and return a choicemodels.tools.MergedChoiceTable.

  • probs_callable (callable) – Callable that generates predicted probabilities for a table of choice scenarios. It should accept a choicemodels.tools.MergedChoiceTable and return a pd.Series with indexes matching the input.

  • alt_capacity (str, optional) – Name of a column in the alternatives table that expresses the capacity of alternatives. If not provided, each alternative is interpreted as accommodating a single chooser.

  • chooser_size (str, optional) – Name of a column in the choosers table that expresses the size of choosers. Choosers might have varying sizes if the alternative capacities are amounts rather than counts – e.g. square footage or employment capacity. Chooser sizes must be in the same units as alternative capacities. If not provided, each chooser has a size of 1.

  • max_iter (int or None, optional) – Maximum number of iterations. If None (default), the algorithm will iterate until all choosers are matched or no alternatives remain.

  • chooser_batch_size (int or None, optional) – Size of the batches for processing smaller groups of choosers one at a time. Useful when the anticipated size of the merged choice tables (choosers X alternatives X covariates) will be too large for python/pandas to handle.

Returns

List of chosen alternative id’s, indexed with the chooser (observation) id.

Return type

pd.Series

Parallelized capacity-constrained choices

choicemodels.tools.parallel_lottery_choices(choosers, alternatives, mct_callable, probs_callable, alt_capacity=None, chooser_size=None, chooser_batch_size=None)[source]

A parallelized version of the iterative_lottery_choices method. Chooser batches are processed in parallel rather than sequentially.

NOTE: In it’s current form, this method is only supported for simulating choices where every alternative has a capacity of 1.

Parameters
  • choosers (pd.DataFrame) – Table with one row for each chooser or choice scenario, with unique ID’s in the index field. Additional columns can contain fixed attributes of the choosers. (Reserved column names: ‘_size’.)

  • alternatives (pd.DataFrame) – Table with one row for each alternative, with unique ID’s in the index field. Additional columns can contain fixed attributes of the alternatives. (Reserved column names: ‘_capacity’.)

  • mct_callable (callable) – Callable that samples alternatives to generate a table of choice scenarios. It should accept subsets of the choosers and alternatives tables and return a choicemodels.tools.MergedChoiceTable.

  • probs_callable (callable) – Callable that generates predicted probabilities for a table of choice scenarios. It should accept a choicemodels.tools.MergedChoiceTable and return a pd.Series with indexes matching the input.

  • alt_capacity (str, optional) – Name of a column in the alternatives table that expresses the capacity of alternatives. If not provided, each alternative is interpreted as accommodating a single chooser.

  • chooser_size (str, optional) – Name of a column in the choosers table that expresses the size of choosers. Choosers might have varying sizes if the alternative capacities are amounts rather than counts – e.g. square footage or employment capacity. Chooser sizes must be in the same units as alternative capacities. If not provided, each chooser has a size of 1.

  • max_iter (int or None, optional) – Maximum number of iterations. If None (default), the algorithm will iterate until all choosers are matched or no alternatives remain.

  • chooser_batch_size (int or None, optional) – Size of the batches for processing smaller groups of choosers one at a time. Useful when the anticipated size of the merged choice tables (choosers X alternatives X covariates) will be too large for python/pandas to handle.

Returns

List of chosen alternative id’s, indexed with the chooser (observation) id.

Return type

pd.Series