Shared utilities¶
The utilities are mainly helper functions for templates.
General template tools API¶
-
class
urbansim_templates.shared.
CoreTemplateSettings
(name=None, tags=[], notes=None, autorun=False, template=None, template_version=None)[source]¶ Stores standard parameters and logic used by all templates. Parameters can be passed to the constructor or set as attributes.
- Parameters
name (str, optional) – Name of the configured template instance.
tags (list of str, optional) – Tags associated with the configured template instance.
notes (str, optional) – Notes associates with the configured template instance.
autorun (bool, optional) – Whether to run the configured template instance automatically when it’s registered or loaded by ModelManager. The overall default is False, but the default can be overriden at the template level.
template (str) – Name of the template class associated with a configured instance.
template_version (str) – Version of the template class package.
Column output tools API¶
-
class
urbansim_templates.shared.
OutputColumnSettings
(column_name=None, table=None, data_type=None, missing_values=None, cache=False, cache_scope='forever')[source]¶ Stores standard parameters used by templates that generate or modify columns. Parameters can be passed to the constructor or set as attributes.
- Parameters
column_name (str, optional) – Name of the Orca column to be created or modified. Generally required before running a configured template.
table (str, optional) – Name of Orca table the column will be associated with. Generally required before running the configured template.
data_type (str, optional) – Python type or
numpy.dtype
to case the column’s values to.missing_values (str or numeric, optional) – Value to use for rows that would otherwise be missing.
cache (bool, default False) – Whether to cache column values after they are calculated
cache_scope ('step', 'iteration', or 'forever', default 'forever') – How long to cache column values for (ignored if
cache
is False).
Table schemas and merging API¶
-
urbansim_templates.utils.
merge_tables
(tables, columns=None)[source]¶ Merge two or more tables into a single DataFrame.
All the data will eventually be merged onto the first table in the list. In each merge stage, we’ll refer to the right-hand table as the “source” and the left-hand one as the “target”.
Tables are merged using ModelManager schema rules: The source table must have a unique index, and the target table must have a column with a matching name, which will be used as the join key. Multi-indexes are fine, but all of the index columns need to be present in the target table.
The last table in the list is the initial source. The algorithm searches backward through the list for a table that qualifies as a target. The source table is left- joined onto the target, and then the algorithm continues with the second-to-last table as the new source.
Example 1: Tables A and B share join keys. Tables B and C share join keys. Merging [A, B, C] will left-join C onto B, and then left-join the result onto A.
Example 2: Tables A and B share join keys. Tables A and C also share join keys, but tables B and C don’t. Merging [A, B, C] will left-join C onto A, and then left-join B onto the result of the first join.
If you provide a list of
columns
, the output table will be limited to columns in this list. The index(es) of the left-most table will always be retained, but it’s a good practice to list them anyway. Column names not found will be ignored.If two tables contain columns with identical names (other than join keys), they can’t be automatically merged. If the columns are just incidental and not needed in the final output, you can perform the merge by providing a
columns
list that excludes them.A note about data types: They will be retained, but if NaN values need to be added (e.g. if some identifiers from the target table aren’t found in the source table), data may need to be cast to a type that allows missing values. For better control over this, see
urbansim_templates.data.ColumnFromBroadcast()
.- Parameters
tables (list of str, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame) – Two or more tables to merge. Types can be mixed and matched.
columns (list of str, optional) – Names of columns to retain in the final output.
- Returns
- Return type
pd.DataFrame
-
urbansim_templates.utils.
validate_all_tables
()[source]¶ Validate all tables registered with Orca. See
validate_table()
above.- Returns
- Return type
bool
-
urbansim_templates.utils.
validate_table
(table, reciprocal=True)[source]¶ Check some basic expectations about an Orca table:
Confirm that it includes a unique, named index column (a.k.a. primary key) or set of columns (multi-index, a.k.a. composite key). If not, raise a ValueError.
Confirm that none of the other columns in the table share names with the index(es). If they do, raise a ValueError.
If the table contains columns whose names match the index columns of other tables registered with Orca, check whether they make sense as join keys. This prints a status message with the number of presumptive foreign-key values that are found in the primary/composite key, for evaluation by the user.
Perform the same check for columns in _other_ tables whose names match the index column(s) of _this_ table.
It doesn’t currently compare indexes to indexes. (Maybe it should?)
Running this will trigger loading all registered Orca tables, which may take a while. Stand-alone columns will not be loaded unless their names match an index column.
Doesn’t currently incorporate
orca_test
validation, but it might be added.- Parameters
table (str) – Name of Orca table to validate.
reciprocal (bool, default True) – Whether to also check how columns of other tables align with this one’s index. If False, only check this table’s columns against other tables’ indexes.
- Returns
- Return type
bool
Other helper functions API¶
-
urbansim_templates.utils.
all_cols
(table)[source]¶ Returns a list of all column names in a table, including index(es). Input can be an Orca table name,
orca.DataFrameWrapper
,orca.TableFuncWrapper
, orpd.DataFrame
.- Parameters
table (str, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame) –
- Returns
- Return type
list of str
-
urbansim_templates.utils.
cols_in_expression
(expression)[source]¶ Extract all possible column names from a
df.eval()
-style expression.This is achieved using regex to identify tokens in the expression that begin with a letter and contain any number of alphanumerics or underscores, but do not end with an opening parenthesis. This excludes function names, but would not exclude constants (e.g. “pi”), which are semantically indistinguishable from column names.
- Parameters
expression (str) –
- Returns
cols
- Return type
list of str
-
urbansim_templates.utils.
get_data
(tables, fallback_tables=None, filters=None, model_expression=None, extra_columns=None)[source]¶ Generate a
pd.DataFrame
for model estimation or simulation. Automatically loads tables from Orca, merges them, and removes columns not referenced in a model expression or data filter. Additional columns can be requested.If filters are provided, the output will include only rows that match the filter criteria.
See
urbansim_templates.utils.merge_tables()
for a detailed description of how the merges are performed.- Parameters
tables (str or list of str) – Orca table(s) to draw data from.
fallback_tables (str or list of str, optional) – Table(s) to use if first parameter evaluates to None. (This option will be removed shortly when estimation and simulation settings are separated.)
filters (str or list of str, optional) – Filter(s) to apply to the merged data, using pd.DataFrame.query().
model_expression (str, optional) – Model expression that will be evaluated using the output data. Only used to drop non-relevant columns. PyLogit format is not yet supported.
extra_columns (str or list of str, optional) – Columns to include, in addition to any in the model expression and filters. (If this and the model_expression are both None, all columns will be included.)
- Returns
- Return type
pd.DataFrame
-
urbansim_templates.utils.
get_df
(table, columns=None)[source]¶ Returns a table as a
pd.DataFrame
. Input can be an Orca table name,orca.DataFrameWrapper
,orca.TableFuncWrapper
, orpd.DataFrame
.Optionally, columns can be limited to those that appear in a list of names. The list may contain duplicates or columns not in the table. Index(es) will always be retained, but it’s a good practice to list them anyway.
- Parameters
table (str, orca.DataFrameWrapper, orca.TableFuncWrapper, or pd.DataFrame) –
columns (list of str, optional) –
- Returns
- Return type
pd.DataFrame
-
urbansim_templates.utils.
to_list
(items)[source]¶ In many places we accept either a single string or a list of strings. This function normalizes None -> [None], str -> [str], and leaves lists unchanged.
- Parameters
items (str, list, or None) –
- Returns
- Return type
list
-
urbansim_templates.utils.
trim_cols
(df, columns=None)[source]¶ Limit a DataFrame to columns that appear in a list of names. List may contain duplicates or names not in the DataFrame. Index(es) of the DataFrame will always be retained, but it’s a good practice to list them anyway. If
columns
is None, all columns are retained. Returns the original DataFrame, not a copy.- Parameters
df (pd.DataFrame) –
columns (list of str, optional) –
- Returns
- Return type
pd.DataFrame
-
urbansim_templates.utils.
update_column
(table, column, data, fallback_table=None, fallback_column=None)[source]¶ Update an Orca column. If it doesn’t exist yet, add it to the wrapped DataFrame. Values will be aligned using the indexes if possible.
Data types: If the column already exists, new values will be cast to match the existing data type. If the column is new, it will retain the data type of the pd.Series that’s passed to this function – unless it doesn’t fully align with the table’s index, in which case it may be cast to allow missing values (e.g. from int to float).
- Parameters
table (str or list of str) – Name of Orca table to update. If list, the first element will be used.
column (str) – Name of existing column to update, or new column to create. Cannot be an index.
data (pd.Series) – Column of data to update or add.
fallback_table (str or list of str) – Name of Orca table to use if
table
evaluates to None.fallback_column (str) – Name of Orca column to use if
column
evaluates to None.
- Returns
- Return type
None
-
urbansim_templates.utils.
update_name
(template, name=None)[source]¶ Generate a name for a configured model step, based on its template class and the current timestamp. But if a custom name has already been provided, return that instead. (A name is judged to be custom if it does not contain the class type.)
- Parameters
template (str) – Template class name.
name (str, optional) – Existing name for the configured model step.
- Returns
- Return type
str
Spec validation API¶
-
urbansim_templates.utils.
validate_template
(cls)[source]¶ Checks whether a template class meets the basic expectations for working with ModelManager, to aid in development and testing.
Looks for ‘to_dict’, ‘from_dict’, and ‘run’ methods, and ‘name’, ‘tags’, ‘template’, and ‘template_version’ attributes. Checks that an object can be instantiated without arguments, plus some additional behaviors. See documentation for a full description of ModelManager specs and guidelines.
There are many behaviors this does NOT check, because we don’t know what particular parameters are expected and valid for a given template. For example, saving a configured model step and reloading it should produce an equivalent object, but this needs to be checked in template-specific unit tests.
- Parameters
cls (class) – Template class.
- Returns
- Return type
bool
Version management API¶
-
urbansim_templates.utils.
parse_version
(v)[source]¶ Parses a version string into its component parts. String is expected to follow the pattern “0.1.1.dev0”, which would be parsed into (0, 1, 1, 0). The first two components are required. The third is set to 0 if missing, and the fourth to None.
- Parameters
v (str) – Version string using syntax described above.
- Returns
- Return type
tuple with format (int, int, int, int or None)
-
urbansim_templates.utils.
version_greater_or_equal
(a, b)[source]¶ Tests whether version string ‘a’ is greater than or equal to version string ‘b’. Version syntax should follow the pattern described for version_parse().
Note that ‘dev’ versions are pre-releases, so ‘0.2’ < ‘0.2.1.dev5’ < ‘0.2.1’.
- Parameters
a (str) – First version string, formatted as described in version_parse().
b (str) – Second version string, formatted as described in version_parse().
- Returns
- Return type
boolean