Data management templates¶

Usage¶

Data templates help you load tables into Orca, create columns of derived data, or save tables or subsets of tables to disk.

from urbansim_templates.data import LoadTable

t = LoadTable()
t.table = 'buildings'  # a name for the Orca table
t.source_type = 'csv'
t.path = 'buildings.csv'
t.csv_index_cols = 'building_id'
t.name = 'load_buildings'  # a name for the model step that sets up the table

You can run this directly using t.run(), or register the configured template to be part of a larger workflow:

from urbansim_templates import modelmanager

modelmanager.register(t)

Registration does two things: (a) it saves the configured template to disk as a yaml file, and (b) it creates a model step with logic for loading the table. Running the model step is equivalent to running the configured template object:

t.run()

# equivalent:
import orca
orca.run(['load_buildings'])

Strictly speaking, running the model step doesn’t load the data, it just sets up an Orca table with instructions for loading the data when it’s needed. (This is called lazy evaluation.)

orca.run(['load_buildings'])  # now an Orca table named 'buildings' is registered

orca.get_table('buildings').to_frame()  # now the data is read from disk

Because “running” the table-loading step is costless, it’s done automatically when you register a configured template. It’s also done automatically when you initialize a ModelManager session and table-loading configs are read from yaml. (If you’d like to disable this for a particular table, you can set t.autorun == False.)

Recommended data schemas¶

The LoadTable template will work with any data that can be loaded into a Pandas DataFrame. But we highly recommend following stricter data schema rules:

Each table should include a unique, named index column (a.k.a. primary key) or set of columns (multi-index, a.k.a composite key).
If a column is meant to be a join key for another table, it should have the same name as the index of that table.
Duplication of column names across tables (except for the join keys) is discouraged, for clarity.

If you follow these rules, tables can be automatically merged on the fly, for example to assemble estimation data or calculate indicators.

You can use validate_table() or validate_all_tables() to check whether these expectations are met. When templates merge tables on the fly, they use merge_tables().

These utility functions work with any Orca table that meets the schema expectations, whether or not it was created with a template.

Compatibility with Orca¶

From Orca’s perspective, tables set up using the LoadTable template are equivalent to tables that are registered using orca.add_table() or the @orca.table decorator. Technically, they are orca.TableFuncWrapper objects.

Unlike the templates, Orca relies on user-specified “broadcast” relationships to perform automatic merging of tables. LoadTable does not register any broadcasts, because they’re not needed if tables follow the schema rules above. So if you use these tables in non-template model steps, you may need to add broadcasts separately.

Data loading API¶

class urbansim_templates.data.LoadTable(table=None, source_type=None, path=None, csv_index_cols=None, extra_settings={}, cache=True, cache_scope='forever', copy_col=True, name=None, tags=[], autorun=True)[source]¶

Template for registering data tables from local CSV or HDF files. Parameters can be passed to the constructor or set as attributes.

An instance of this template class stores instructions for loading a data table, packaged into an Orca step. Running the instructions registers the table with Orca.

Parameters

table (str, optional) – Name of the Orca table to be created. Must be provided before running the step.
source_type ('csv' or 'hdf', optional) – Source type. Must be provided before running the step.
path (str, optional) – Local file path to load data from, either absolute or relative to the ModelManager config directory. Please provide a Unix-style path (this will work on any platform, but a Windows-style path won’t, and they’re hard to normalize automatically).
url (str, optional - NOT YET IMPLEMENTED) – Remote url to download file from.
csv_index_cols (str or list of str, optional) – Required for tables loaded from csv.
extra_settings (dict, optional) – Additional arguments to pass to pd.read_csv() or pd.read_hdf(). For example, you could automatically extract csv data from a gzip file using {‘compression’: ‘gzip’}, or specify the table identifier within a multi-object hdf store using {‘key’: ‘table-name’}. See Pandas documentation for additional settings.
orca_test_spec (dict, optional - NOT YET IMPLEMENTED) – Data characteristics to be tested when the table is validated.
cache (bool, default True) – Passed to orca.table(). Note that the default is True, unlike in the underlying general-purpose Orca function, because tables read from disk should not need to be regenerated during the course of a model run.
cache_scope ('step', 'iteration', or 'forever', default 'forever') – Passed to orca.table(). Default is ‘forever’, as in Orca.
copy_col (bool, default True) – Passed to orca.table(). Default is True, as in Orca.
name (str, optional) – Name of the model step.
tags (list of str, optional) – Tags, passed to ModelManager.
autorun (bool, default True) – Automatically run the step whenever it’s registered with ModelManager.

classmethod from_dict(d)[source]¶

Create an object instance from a saved dictionary representation.

Parameters: d (dict) –
Returns
Return type: Table

run()[source]¶

Register a data table with Orca.

Requires values to be set for table, source_type, and path. CSV data also requires csv_index_cols.

Returns
Return type: None

to_dict()[source]¶

Create a dictionary representation of the object.

Returns
Return type: dict

Column creation API¶

class urbansim_templates.data.ColumnFromExpression(meta=None, data=None, output=None)[source]¶

Template to register a column of derived data with Orca, based on an expression. Parameters may be passed to the constructor, but they are easier to set as attributes. The expression can refer to any columns in the same table, and will be evaluated using df.eval(). Values will be calculated lazily, only when the column is needed for a specific operation.

Parameters

meta (CoreTemplateSettings, optional) – Standard parameters. This template sets the default value of meta.autorun to True.
data (ExpressionSettings, optional) – Special parameters for this template.
output (OutputColumnSettings, optional) – Parameters for the column that will be generated. This template uses data.table as the default value for output.table.

classmethod from_dict(d)[source]¶: Create a class instance from a saved dictionary.

classmethod from_dict_0_2_dev5(d)[source]¶: Converter to read saved data from 0.2.dev5 or earlier. Automatically invoked by from_dict() as needed.

run()[source]¶: Run the template, registering a column of derived data with Orca. Requires values to be set for data.table, data.expression, and output.column_name.

to_dict()[source]¶: Create a dictionary representation of the object.

class urbansim_templates.data.ExpressionSettings(table=None, expression=None)[source]¶

Stores custom parameters used by the ColumnFromExpression template. Parameters can be passed to the constructor or set as attributes.

Parameters

table (str, optional) – Name of Orca table the expression will be evaluated on. Required before running then template.
expression (str, optional) – String describing operations on existing columns of the table, for example “a/log(b+c)”. Required before running. Supports arithmetic and math functions including sqrt, abs, log, log1p, exp, and expm1 – see Pandas df.eval() documentation for further details.

Data output API¶

class urbansim_templates.data.SaveTable(table=None, columns=None, filters=None, output_type=None, path=None, extra_settings=None, name=None, tags=[])[source]¶

Template for saving Orca tables to local CSV or HDF5 files. Parameters can be passed to the constructor or set as attributes.

Parameters

table (str, optional) – Name of the Orca table. Must be provided before running the step.
columns (str or list of str, optional) – Names of columns to include. None will return all columns. Indexes will always be included.
filters (str or list of str, optional) – Filters to apply to the data before saving. Will be passed to pd.DataFrame.query().
output_type ('csv' or 'hdf', optional) – Type of file to be created. Must be provided before running the step.
path (str, optional) – Local file path to save the data to, either absolute or relative to the ModelManager config directory. Please provide a Unix-style path (this will work on any platform, but a Windows-style path won’t, and they’re hard to normalize automatically). For dynamic file names, you can include the characters “%RUN%”, “%ITER%”, or “%TS%”. These will be replaced by the run id, the model iteration value, or a timestamp when the output file is created.
extra_settings (dict, optional) – Additional arguments to pass to pd.to_csv() or pd.to_hdf(). For example, you could automatically compress csv data using {‘compression’: ‘gzip’}, or specify a custom table name for an hdf store using {‘key’: ‘table-name’}. See Pandas documentation for additional settings.
name (str, optional) – Name of the model step.
tags (list of str, optional) – Tags, passed to ModelManager.

classmethod from_dict(d)[source]¶

Create an object instance from a saved dictionary representation.

Parameters: d (dict) –
Returns
Return type: Table

get_dynamic_filepath()[source]¶

Substitute run id, model iteration, and/or timestamp into the filename.

For the run id and model iteration, we look for Orca injectables named run_id and iter_var, respectively. If none is found, we use 0.

The timestamp is UTC, formatted as YYYYMMDD-HHMMSS.

Returns
Return type: str

run()[source]¶

Save a table to disk.

Saving a table to an HDF store requires providing a key that will be used to identify the table in the store. We’ll use the Orca table name, unless you provide a different key in the extra_settings.

Returns
Return type: None

to_dict()[source]¶

Create a dictionary representation of the object.

Returns
Return type: dict