Getting Started¶

Let us know what you are working on, or if you think you have a great use case, by tweeting us at @urbansim, posting on the UrbanSim forum, or contacting us at info@urbansim.com.

Installation¶

The UrbanSim library is currently tested with Python versions 2.7, 3.5, 3.6, 3.7, and 3.8.

UrbanSim is distributed on the Python Package Index (for Pip) and on Conda Forge. The official source code is hosted on GitHub. (UrbanSim versions before 3.2 are on the UDST Conda channel rather than Conda Forge.)

You can install UrbanSim with either the Pip or Conda package manager:

pip install urbansim

conda install urbansim --channel conda-forge

Dependencies include NumPy, Pandas, and Statsmodels, plus another UDST library: Orca for task orchestration. These will be installed automatically if needed.

When new releases of UrbanSim come out, you can upgrade like this:

pip install urbansim --upgrade

conda update urbansim --channel conda-forge

Installing from GitHub¶

You can also install UrbanSim directly from source code on GitHub, for example to use pre-release features or to modify the code yourself. The UrbanSim library is written entirely in Python; no compilation is needed.

git clone -b branch-name https://github.com/udst/urbansim.git
cd urbansim
pip install .

Reporting bugs and contributing to UrbanSim¶

Please report any bugs you encounter via GitHub Issues.

If you have improvements or new features you would like to see in UrbanSim:

Open a feature request via GitHub Issues.
See our code contribution instructions here.
Contribute your code from a fork or branch by using a Pull Request and request a review so it can be considered as an addition to the codebase.

Tools of the Trade¶

This page provides a brief introduction to Pandas and Jupyter Notebooks - two of the key tools that the new implementation of UrbanSim relies on.

Pandas¶

Pandas is a data science library written in Python which is an outstanding tool for data manipulation and exploration. To get started, we recommend Wes McKinney’s 10 minute video tour.

Pandas is similar to a relational database with a much easier API than SQL, and with much faster performance. However, it makes no attempt to enable multi-user editing of data and transactions the way a database would.

The previous implementation of UrbanSim, known as OPUS, implemented much of this functionality itself in the absence of such robust libraries - in fact, the OPUS implementation of UrbanSim was started around 2005, while Pandas wasn’t developed until 2010.

One of the main motivations for the current implementation of UrbanSim is to refactor the code to make it simpler, faster, and smaller, while leveraging terrific new libraries like Pandas that have solved very elegantly some of the functionality UrbanSim previously had to implement directly.

A Note on Pandas Indexing¶

One very important note about Pandas - the real genius of the abstraction is that all records in a table are viewed as key-value pairs. Every table has an index or a multi-index which is used to align the table on the key for that table.

This is similar to having a primary key in a database except that now you can do mathematical operations with columns. For instance, you can now take a column from one table and a column from another table and add or multiply them and the operation will automatically align on the key (i.e. it will add elements with the same index value).

This is incredibly handy. Almost all of the benefits of using Pandas come down to using these indexes in intelligent and powerful ways. But it’s not always easy to get the functionality exactly right the first time.

Some general advice about using Pandas: if you have a problem with Pandas, check your indexes, re-check your indexes, and do it one more time for good measure.

A surprising amount of the time when you have bugs in your code, the Pandas series is not indexed correctly when performing the subsequent operations and it is not doing what you intend. You’ve been warned.

To be clear, the canonical example of using Pandas might be having a parcel table indexed on parcel id and a building table indexed on building_id, but with an attribute in the buildings table called parcel_id (the foreign key).

The tables can be merged using

pd.merge(buildings, parcels, left_on="parcel_id", right_index=True, how="left")

You will do this a lot. If you want a comparison of SQL and pandas, check out this series of blog posts.

Jupyter Notebooks¶

One of our favorite development tools is Jupyter Notebook, which is perfect for interactively executing small cells of Python code. We use notebooks a LOT, and they are a wonderful way to avoid the command line in a cross-platform way. The notebook is a fantastic tool to develop snippets of code a few lines at a time, and to capture and communicate higher-level workflows.

This also makes the notebook a fantastic pedagogical tool - in other words it’s great for demos and communicating both the input and output of cells of Python code (e.g. nbviewer). Many of the full-size examples of UrbanSim on this site are presented in notebooks.

In many cases, you can write entire UrbanSim models in the notebook, but this is not generally considered the best practice. It’s entirely up to you though, and we are happy to share with you our insights from many hours of developing and using this set of tools.

The Python flavor of Jupyter notebook uses IPython, an interactive Python interpreter that is built on Python that helps when interfacing with the operating system, profiling, parallelizing, and with many other technical details.

A Gentle Introduction to UrbanSim¶

Background¶

UrbanSim has been an active research project since the late 1990’s, and has undergone continual re-thinking, and re-engineering over the ensuing years, as documented in many of the accumulated research papers. Below is a brief, high-level summary of UrbanSim in only a few paragraphs from a modeling/programmer perspective. In pseudocode, UrbanSim can be boiled down to a series of models estimated and then simulated in sequence.:

for model in models:
    model.estimate(model_configuration_parameters)
for i in range(NUMYEARSINSIMULATION):
    for model in models:
        model.simulate(model_configuration_parameters)

The set of models varies among the many UrbanSim applications to different regions, due to data availability and cleanliness, the time and resources that can be devoted to the project, and specific research questions that motivated the projects. The set of models almost always includes at least the following:

Residential Real Estate Models¶

Hedonic Regression Models estimate and predict real estate prices for different residential building types
Location Choice Models estimate and predict where different types of households will choose to live, and are usually segmented by income and sometimes by other demographics. These models are generally coupled with relocation models to capture the varying rates of relocation by households of different demographics.
Transition models generate new households/persons to match control totals that specify the growth of households by demographics makeup.

Non-residential Real Estate Models¶

Hedonic Regression Models are analogous to the above except for modeling the rent received on non-residential building types.
Location Choices Models are analagous to the above except for modeling the location choices of jobs/establishments, and are usually segmented by employment sector (and also include relocation rate models).
Transition models generate new jobs/firms to match control totals that specify the growth of businesses by sector.

Real Estate Development Models¶

Some representation of real estate development must be modeled to accurately represent regional real estate markets. In UrbanSim there are several options for modeling the development process, but most users are now moving to the Pro Forma based modeling approach.

Development Project Location Choice Models are the easiest way to represent development, which sample from all recent development projects, estimate a model on where development is currently being located, and find an appropriate location for a copied development.
Pro Forma Developer Models take the perspective of the developer and measures the profitability of a proposed development by predicting the cash flows from the predicted rent or sales price in a given submarket and comparing these inflows to the anticipated development costs of the project.

Development will only happen where the predicted rent is high enough to cover costs of construction and a moderate profit, and will occur roughly to meet demand based on the location choice models and control totals.

This type of developer model is highly flexible and can account for various planning policies including affordable housing, parking requirements, subsidies of various kinds, density bonuses, and other similar policies.

Development regulations such as comprehensive plans and zoning provide regulatory constraints on what types of developments and what densities can be considered by the model.

It should be noted that many other kinds of models can be included in the simulation loop as well. For instance, inclusion of scheduled development events is a key element to representing known future development projects.

In general, any Python script that reads and writes data can be included to help answer a specific research question or to model a certain real-world behavior - models can even be parameterized in JSON or YAML and included in the standard model set, and an ever-increasing set of functionality will be added over time.

Specifying Scenario Inputs¶

Although UrbanSim is designed to model real estate markets, the raison d’etre of UrbanSim is as a scenario planning tool. Regional or city planners want to understand how their cities will develop in the presence or absence of different policies or in the context of different assumptions that they have little or no control over, like economic growth or migration of households.

In a sense, this style of regional modeling is kind of like retirement planning, but for cities - will there be enough room for all the households and jobs if the city grows by 3% every year? What if it grows by 5%? 10%? If current zoning policies don’t appropriately accommodate that growth, it’s likely that prices will rise, but by how much? If growth is pushed to different parts of the region, will there be environmental impacts or an inefficient transportation network that increases traffic, travel times, and infrastructure costs? What will the resulting urban form look like? Sprawl, Manhattan, or something in between?

UrbanSim is designed to investigate these questions, and other questions like them, and to allow outcomes to be analyzed as assumptions are changed. These assumptions can include, but are not limited to the following.

Control Totals specify in a simple Excel-based format the basic assumptions on demographic shifts of households and of sector shifts of employment. These files control the transition models and which new households and jobs are added to the simulation.
Zoning Changes in the form of scenario-specific density limits such as max_far and max_dua are passed to the pro formas when testing for feasibility. Simple utility functions are also common to upzone certain parcels only if certain policies affect them.
Fees and Subsidies may also come in to play by adjusting the feasibility of buildings that are market-rate infeasible. Fees can also be collected on profitable buildings and transferred to less profitable buildings, as with affordable housing policies.
Developer Assumptions can also be tested, like interest rates, the impact of mixed use buildings on feasibility, of density bonuses for neighborhood amenities, and of lowering or raising parking requirements.

Using Orca as a simulation framework¶

Before moving on, it’s useful to describe at a high level how Orca, the pipeline orchestration framework built for UrbanSim, helps solve the problems described thus far in this getting started document.

Over many years of implementing UrbanSim models, we realized that we wanted a flexible framework that had the following features:

Tables can be registered from a wide variety of sources including databases, text files, and shapefiles.
Relationships can be defined between tables and data from different sources can be easily merged and used as a new entity.
Calculated columns can be specified so that when underlying data is changed, calculated columns are kept in sync automatically.
Data processing models can be defined so that updates can be performed with user-specified breakpoints, capturing semantic steps that can be mixed and matched by the user.

To this end Orca implements this functionality as tables, broadcasts, columns, and model steps respectively. We decided to implement these concepts with Python functions and decorators. This is what is happening when you see the @orca.DECORATOR_NAME syntax everywhere, e.g.:

@orca.table('buildings')
def buildings(store):
    return store['buildings']

@orca.table('parcels')
def parcels(store):
    return store['parcels']

With the use of decorators you can register these concepts with the simulation engine and deal with one small piece of the simulation at a time - for instance, how to access data for a certain table, or how to compute a certain variable, or how to run a certain model.

The objects can then be passed to each other using injection, which passes objects by name automatically into a function. For instance, assuming the parcels and buildings tables have previously been registered (as above), a new column called total_units on the parcels table can be defined with a function which takes the buildings and parcels objects as arguments. The tables that were registered are now available within the function and can be used in many other functions as well.:

@orca.column('parcels', 'total_units')
def residential_unit_density(buildings, parcels):
    return buildings.residential_units.groupby(buildings.parcel_id).sum() / parcels.acres

If done well, these functions are limited to just a few lines which implement a very specific piece of functionality, and there will be more detailed examples in the tutorials section.

Note that this approach is inspired by a number of different frameworks (in Python and otherwise) such as py.test, flask, and even web frameworks like Angular.

Note that this is designed to be an extremely flexible framework. Models can be injected into tables, and tables into models, and infinite recursion is possible (this is not suggested!). Additionally, multiple kinds of decorators can be added to the same file so that a piece of functionality can be separated - for instance, an affordable housing module. On the other hand, models could be kept together, columns together, and tables together - the organization is up to you. We hope that this flexibility inspires innovation for specific use cases, but what follows is a set of tutorials that we consider best practices.