The DataFrame Explorer is used to create a web service within the IPython
Notebook which responds to queries from a web browser. The REST API is
undocumented as the user does not interact with that API. Simply call the
start method below and then open http://localhost:8765 in any web browser.
See Exploration Workflow for sample code from the San Francisco case study.
The dframe_explorer takes a dictionary of DataFrames which are joined to a set of shapes for visualization. The most common case is to use a geojson format shapefile of zones to join to any DataFrame that has a zone_id (the dframe_explorer module does the join for you). Then set the center and zoom level for the map, the name of the geojson shapefile is passed, and the join keys both in the geojson file and the DataFrames. Below is a screenshot of the result as displayed in your web browser.
Here is what each dropdown on the web page does:
The first dropdown gives the names of the DataFames you have passed
The second dropdown allows you to choose between each of the columns in the DataFrame with the name from the first dropdown
The third dropdown selects the color scheme from the colorbrewer color schemes
The fourth dropdown sets
The fifth dropdown selects the Pandas aggregation method to use
The sixth dropdown executes the .query method on the Pandas DataFrame in order to filter the input data
The seventh dropdown executes the .eval method on the Pandas DataFrame in order to create simple computed variables that are not already columns on the DataFrame.
What’s it Doing Exactly?¶
So what is this doing? The web service is translating the drop downs to a simple interactive Pandas statement, for example:
The web service will print out each statement it executes. The website then transparently joins the output Pandas series to the shapes and create an interactive slippy web map using the Leaflet Javasript library. The code for this map is really quite simple - feel free to browse the code and add functionality as required.
To be clear, the website is performing a Pandas aggregation on the fly.
If you have a buildings DataFrame with millions of records, Pandas will
zone_id and perform an aggregation of your choice.
This is designed to give you a quickly navigable map interface to understand
the underlying disaggregate data, similar to that supplied by commercial
projects such as Tableau.
As a concrete example, note that the
households table has a
and is thus available for aggregation in
dframe_explorer. Since the web
service is running aggregations on the disaggregate data, clicking to the
households table and
persons attribute and an aggregation of
This computes the sum of persons in each household by zone, or more simply, the population of each zone. If the aggregation is changed to mean, the service will run:
What does this compute exactly? It computes the average number of persons per household in each zone, or the average household size by zone.
DataFrame Explorer API¶
start(views, center=[37.7792, - 122.2191], zoom=11, shape_json='data/zones.json', geom_name='ZONE_ID', join_name='zone_id', precision=8, port=8765, host='localhost', testing=False)¶
Start the web service which serves the Pandas queries and generates the HTML for the map. You will need to open a web browser and navigate to http://localhost:8765 (or the specified port)
- viewsPython dictionary
This is the data that will be displayed in the maps. Keys are strings (table names) and values are dataframes. Each data frame should have a column with the name specified as join_name below
- centera Python list with two floats
The initial latitude and longitude of the center of the map
The initial zoom level of the map
The path to the geojson file which contains that shapes that will be displayed
The field name from the JSON file which contains the id of the geometry
The column name from the dataframes passed as views (must be in each view) which joins to geom_name in the shapes
The precision of values to show in the legend on the map
The port for the web service to respond on
The hostname to run the web service from
Whether to print extra debug information
- Does not return - takes over control of the thread and responds to
- queries from a web browser