Shared utilities

The utilities are mainly helper functions for templates.

General template tools API

class urbansim_templates.shared.CoreTemplateSettings(name=None, tags=[], notes=None, autorun=False, template=None, template_version=None)[source]

Stores standard parameters and logic used by all templates. Parameters can be passed to the constructor or set as attributes.

Parameters
  • name (str, optional) – Name of the configured template instance.

  • tags (list of str, optional) – Tags associated with the configured template instance.

  • notes (str, optional) – Notes associates with the configured template instance.

  • autorun (bool, optional) – Whether to run the configured template instance automatically when it’s registered or loaded by ModelManager. The overall default is False, but the default can be overriden at the template level.

  • template (str) – Name of the template class associated with a configured instance.

  • template_version (str) – Version of the template class package.

classmethod from_dict(d)[source]

Create a class instance from a saved dictionary representation.

Parameters

d (dict) –

Returns

obj

Return type

CoreTemplateSettings

to_dict()[source]

Create a dictionary representation of the object.

Returns

d

Return type

dict

Column output tools API

class urbansim_templates.shared.OutputColumnSettings(column_name=None, table=None, data_type=None, missing_values=None, cache=False, cache_scope='forever')[source]

Stores standard parameters used by templates that generate or modify columns. Parameters can be passed to the constructor or set as attributes.

Parameters
  • column_name (str, optional) – Name of the Orca column to be created or modified. Generally required before running a configured template.

  • table (str, optional) – Name of Orca table the column will be associated with. Generally required before running the configured template.

  • data_type (str, optional) – Python type or numpy.dtype to case the column’s values to.

  • missing_values (str or numeric, optional) – Value to use for rows that would otherwise be missing.

  • cache (bool, default False) – Whether to cache column values after they are calculated

  • cache_scope ('step', 'iteration', or 'forever', default 'forever') – How long to cache column values for (ignored if cache is False).

classmethod from_dict(d)[source]

Create a class instance from a saved dictionary representation.

Parameters

d (dict) –

Returns

obj

Return type

OutputColumnSettings

to_dict()[source]

Create a dictionary representation of the object.

Returns

d

Return type

dict

urbansim_templates.shared.register_column(build_column, settings)[source]

Register a callable as an Orca column.

Parameters
  • build_column (callable) – Callable should return a pd.Series.

  • settings (ColumnOutputSettings) –

Table schemas and merging API

urbansim_templates.utils.merge_tables(tables, columns=None)[source]

Merge two or more tables into a single DataFrame.

All the data will eventually be merged onto the first table in the list. In each merge stage, we’ll refer to the right-hand table as the “source” and the left-hand one as the “target”.

Tables are merged using ModelManager schema rules: The source table must have a unique index, and the target table must have a column with a matching name, which will be used as the join key. Multi-indexes are fine, but all of the index columns need to be present in the target table.

The last table in the list is the initial source. The algorithm searches backward through the list for a table that qualifies as a target. The source table is left- joined onto the target, and then the algorithm continues with the second-to-last table as the new source.

Example 1: Tables A and B share join keys. Tables B and C share join keys. Merging [A, B, C] will left-join C onto B, and then left-join the result onto A.

Example 2: Tables A and B share join keys. Tables A and C also share join keys, but tables B and C don’t. Merging [A, B, C] will left-join C onto A, and then left-join B onto the result of the first join.

If you provide a list of columns, the output table will be limited to columns in this list. The index(es) of the left-most table will always be retained, but it’s a good practice to list them anyway. Column names not found will be ignored.

If two tables contain columns with identical names (other than join keys), they can’t be automatically merged. If the columns are just incidental and not needed in the final output, you can perform the merge by providing a columns list that excludes them.

A note about data types: They will be retained, but if NaN values need to be added (e.g. if some identifiers from the target table aren’t found in the source table), data may need to be cast to a type that allows missing values. For better control over this, see urbansim_templates.data.ColumnFromBroadcast().

Parameters
  • tables (list of str, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame) – Two or more tables to merge. Types can be mixed and matched.

  • columns (list of str, optional) – Names of columns to retain in the final output.

Returns

Return type

pd.DataFrame

urbansim_templates.utils.validate_all_tables()[source]

Validate all tables registered with Orca. See validate_table() above.

Returns

Return type

bool

urbansim_templates.utils.validate_table(table, reciprocal=True)[source]

Check some basic expectations about an Orca table:

  • Confirm that it includes a unique, named index column (a.k.a. primary key) or set of columns (multi-index, a.k.a. composite key). If not, raise a ValueError.

  • Confirm that none of the other columns in the table share names with the index(es). If they do, raise a ValueError.

  • If the table contains columns whose names match the index columns of other tables registered with Orca, check whether they make sense as join keys. This prints a status message with the number of presumptive foreign-key values that are found in the primary/composite key, for evaluation by the user.

  • Perform the same check for columns in _other_ tables whose names match the index column(s) of _this_ table.

  • It doesn’t currently compare indexes to indexes. (Maybe it should?)

Running this will trigger loading all registered Orca tables, which may take a while. Stand-alone columns will not be loaded unless their names match an index column.

Doesn’t currently incorporate orca_test validation, but it might be added.

Parameters
  • table (str) – Name of Orca table to validate.

  • reciprocal (bool, default True) – Whether to also check how columns of other tables align with this one’s index. If False, only check this table’s columns against other tables’ indexes.

Returns

Return type

bool

Other helper functions API

urbansim_templates.utils.all_cols(table)[source]

Returns a list of all column names in a table, including index(es). Input can be an Orca table name, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame.

Parameters

table (str, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame) –

Returns

Return type

list of str

urbansim_templates.utils.cols_in_expression(expression)[source]

Extract all possible column names from a df.eval()-style expression.

This is achieved using regex to identify tokens in the expression that begin with a letter and contain any number of alphanumerics or underscores, but do not end with an opening parenthesis. This excludes function names, but would not exclude constants (e.g. “pi”), which are semantically indistinguishable from column names.

Parameters

expression (str) –

Returns

cols

Return type

list of str

urbansim_templates.utils.get_data(tables, fallback_tables=None, filters=None, model_expression=None, extra_columns=None)[source]

Generate a pd.DataFrame for model estimation or simulation. Automatically loads tables from Orca, merges them, and removes columns not referenced in a model expression or data filter. Additional columns can be requested.

If filters are provided, the output will include only rows that match the filter criteria.

See urbansim_templates.utils.merge_tables() for a detailed description of how the merges are performed.

Parameters
  • tables (str or list of str) – Orca table(s) to draw data from.

  • fallback_tables (str or list of str, optional) – Table(s) to use if first parameter evaluates to None. (This option will be removed shortly when estimation and simulation settings are separated.)

  • filters (str or list of str, optional) – Filter(s) to apply to the merged data, using pd.DataFrame.query().

  • model_expression (str, optional) – Model expression that will be evaluated using the output data. Only used to drop non-relevant columns. PyLogit format is not yet supported.

  • extra_columns (str or list of str, optional) – Columns to include, in addition to any in the model expression and filters. (If this and the model_expression are both None, all columns will be included.)

Returns

Return type

pd.DataFrame

urbansim_templates.utils.get_df(table, columns=None)[source]

Returns a table as a pd.DataFrame. Input can be an Orca table name, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame.

Optionally, columns can be limited to those that appear in a list of names. The list may contain duplicates or columns not in the table. Index(es) will always be retained, but it’s a good practice to list them anyway.

Parameters
  • table (str, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame) –

  • columns (list of str, optional) –

Returns

Return type

pd.DataFrame

urbansim_templates.utils.to_list(items)[source]

In many places we accept either a single string or a list of strings. This function normalizes None -> [None], str -> [str], and leaves lists unchanged.

Parameters

items (str, list, or None) –

Returns

Return type

list

urbansim_templates.utils.trim_cols(df, columns=None)[source]

Limit a DataFrame to columns that appear in a list of names. List may contain duplicates or names not in the DataFrame. Index(es) of the DataFrame will always be retained, but it’s a good practice to list them anyway. If columns is None, all columns are retained. Returns the original DataFrame, not a copy.

Parameters
  • df (pd.DataFrame) –

  • columns (list of str, optional) –

Returns

Return type

pd.DataFrame

urbansim_templates.utils.update_column(table, column, data, fallback_table=None, fallback_column=None)[source]

Update an Orca column. If it doesn’t exist yet, add it to the wrapped DataFrame. Values will be aligned using the indexes if possible.

Data types: If the column already exists, new values will be cast to match the existing data type. If the column is new, it will retain the data type of the pd.Series that’s passed to this function – unless it doesn’t fully align with the table’s index, in which case it may be cast to allow missing values (e.g. from int to float).

Parameters
  • table (str or list of str) – Name of Orca table to update. If list, the first element will be used.

  • column (str) – Name of existing column to update, or new column to create. Cannot be an index.

  • data (pd.Series) – Column of data to update or add.

  • fallback_table (str or list of str) – Name of Orca table to use if table evaluates to None.

  • fallback_column (str) – Name of Orca column to use if column evaluates to None.

Returns

Return type

None

urbansim_templates.utils.update_name(template, name=None)[source]

Generate a name for a configured model step, based on its template class and the current timestamp. But if a custom name has already been provided, return that instead. (A name is judged to be custom if it does not contain the class type.)

Parameters
  • template (str) – Template class name.

  • name (str, optional) – Existing name for the configured model step.

Returns

Return type

str

Spec validation API

urbansim_templates.utils.validate_template(cls)[source]

Checks whether a template class meets the basic expectations for working with ModelManager, to aid in development and testing.

Looks for ‘to_dict’, ‘from_dict’, and ‘run’ methods, and ‘name’, ‘tags’, ‘template’, and ‘template_version’ attributes. Checks that an object can be instantiated without arguments, plus some additional behaviors. See documentation for a full description of ModelManager specs and guidelines.

There are many behaviors this does NOT check, because we don’t know what particular parameters are expected and valid for a given template. For example, saving a configured model step and reloading it should produce an equivalent object, but this needs to be checked in template-specific unit tests.

Parameters

cls (class) – Template class.

Returns

Return type

bool

Version management API

urbansim_templates.utils.parse_version(v)[source]

Parses a version string into its component parts. String is expected to follow the pattern “0.1.1.dev0”, which would be parsed into (0, 1, 1, 0). The first two components are required. The third is set to 0 if missing, and the fourth to None.

Parameters

v (str) – Version string using syntax described above.

Returns

Return type

tuple with format (int, int, int, int or None)

urbansim_templates.utils.version_greater_or_equal(a, b)[source]

Tests whether version string ‘a’ is greater than or equal to version string ‘b’. Version syntax should follow the pattern described for version_parse().

Note that ‘dev’ versions are pre-releases, so ‘0.2’ < ‘0.2.1.dev5’ < ‘0.2.1’.

Parameters
  • a (str) – First version string, formatted as described in version_parse().

  • b (str) – Second version string, formatted as described in version_parse().

Returns

Return type

boolean