Introduction#
Why HoloViews?#
HoloViews is an open-source Python library for data analysis and visualization. Python already has excellent tools like numpy, pandas, and xarray for data processing, and bokeh and matplotlib for plotting, so why yet another library?
HoloViews helps you understand your data better, by letting you work seamlessly with both the data and its graphical representation.
HoloViews focuses on bundling your data together with the appropriate metadata to support both analysis and visualization, making your raw data and its visualization equally accessible at all times. This process can be unfamiliar to those used to traditional data-processing and plotting tools, and this getting-started guide is meant to demonstrate how it all works at a high level. More detailed information about each topic is then provided in the User Guide.
With HoloViews, instead of building a plot using direct calls to a plotting library, you first describe your data with a small amount of crucial semantic information required to make it visualizable, then you specify additional metadata as needed to determine more detailed aspects of your visualization. This approach provides immediate, automatic visualization that can be effortlessly requested at any time as your data evolves, rendered automatically by one of the supported plotting libraries (such as Bokeh or Matplotlib).
Tabulated data: subway stations#
To illustrate how this process works, we will demonstrate some of the key features of HoloViews using a collection of datasets related to transportation in New York City. First let’s run some imports to make numpy and pandas accessible for loading the data. Here we start with a table of subway station information loaded from a CSV file with pandas:
import pandas as pd
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
This is the standard way to make the numpy and pandas libraries available in the namespace. We recommend always importing HoloViews as hv
and if you haven’t already installed HoloViews, check out the install instructions on our homepage.
Note that after importing HoloViews as hv
we run hv.extension('bokeh')
to load the bokeh plotting extension, allowing us to generate visualizations with Bokeh. In the next section we will see how you can use other plotting libraries such as matplotlib and even how you can mix and match between them.
Now let’s load our subway data using pandas:
station_info = pd.read_csv('../assets/station_info.csv')
station_info.head()
name | lat | lon | opened | services | service_names | ridership | |
---|---|---|---|---|---|---|---|
0 | First Avenue | 40.730953 | -73.981628 | 1924 | 1 | ['L'] | 7.702110 |
1 | Second Avenue | 40.723402 | -73.989938 | 1936 | 1 | ['F'] | 5.847710 |
2 | Third Avenue | 40.732849 | -73.986122 | 1924 | 1 | ['L'] | 2.386533 |
3 | Fifth Avenue | 40.753821 | -73.981963 | 1920 | 6 | ['7', 'E', 'M', 'N', 'R', 'W'] | 16.220605 |
4 | Sixth Avenue | 40.737335 | -73.996786 | 1924 | 1 | ['L'] | 16.121318 |
We see that this table contains the subway station name, its latitude and longitude, the year it was opened, the number of services available from the station and their names, and finally the yearly ridership (in millions for 2015).
Elements
of visualization#
We can immediately visualize some of the the data in this table as a scatter plot. Let’s view how ridership varies with the number of services offered at each station:
scatter = hv.Scatter(station_info, 'services', 'ridership')
scatter
Here we passed our dataframe to hv.Scatter
to create an object called scatter
, which is independent of any plotting library but here is visualized using bokeh. HoloViews provides a wide range of Element types, all visible in the Reference Gallery.
In this example, scatter
is a simple wrapper around our dataframe that knows that the ‘services’ column is the independent variable, normally plotted along the x-axis, and that the ‘ridership’ column is a dependent variable, plotted on the y-axis. These are our dimensions which we will describe in more detail a little later.
Given that we have the handle scatter
on our Scatter
object, we can show that it is indeed an object and not a plot by printing it:
print(scatter)
:Scatter [services] (ridership)
The bokeh plot above is simply the rich, visual representation of scatter
which is plotted automatically by HoloViews and displayed automatically in the Jupyter notebook. Although HoloViews itself is independent of notebooks, this convenience makes working with HoloViews easiest in the notebook environment.
Compositional Layouts
#
The class Scatter
is a subclass of Element
. As shown in our element gallery, Elements are the simplest viewable components in HoloViews. Now that we have a handle on scatter
, we can demonstrate the composition of these components:
layout = scatter + hv.Histogram(np.histogram(station_info['opened'], bins=24), kdims=['opened'])
layout
In a single line using the +
operator, we created a new, compositional object called a Layout
built from our scatter visualization and a Histogram
that shows how many subway stations opened in Manhattan since 1900. Note that once again, all the plotting is happening behind the scenes. The layout
is not a plot, it’s a new object that exists independently of any given plotting system:
print(layout)
:Layout
.Scatter.I :Scatter [services] (ridership)
.Histogram.I :Histogram [opened] (Frequency)
Array data: taxi dropoffs#
So far we have visualized data in a pandas DataFrame
but HoloViews
is as agnostic to data formats as it is to plotting libraries; see Applying Customizations for more information. This means we can work with array data as easily as we can work with tabular data. To demonstrate this, here are some numpy arrays relating to taxi dropoff locations in New York City:
taxi_dropoffs = {hour:arr for hour, arr in np.load('../assets/hourly_taxi_data.npz').items()}
print('Hours: {hours}'.format(hours=', '.join(taxi_dropoffs.keys())))
print('Taxi data contains {num} arrays (one per hour).\nDescription of the first array:\n'.format(num=len(taxi_dropoffs)))
np.info(taxi_dropoffs['0'])
Hours: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23
Taxi data contains 24 arrays (one per hour).
Description of the first array:
class: ndarray
shape: (256, 256)
strides: (1024, 4)
itemsize: 4
aligned: True
contiguous: True
fortran: False
data pointer: 0x140028000
byteorder: little
byteswap: False
type: float32
As we can see, this dataset contains 24 arrays (one for each hour of the day) of taxi dropoff locations (by latitude and longitude), aggregated over one month in 2015. The array shown above contains the accumulated dropoffs for the first hour of the day.
Compositional Overlays
#
Once again, we can easily visualize this data with HoloViews by passing our array to hv.Image
to create an object named image
. This object has the spatial extent of the data declared as the bounds
, in terms of the corresponding range of latitudes and longitudes.
bounds = (-74.05, 40.70, -73.90, 40.80)
image = hv.Image(taxi_dropoffs['0'], ['lon','lat'], bounds=bounds)
HoloViews supports numpy
, xarray
, and dask
arrays when working with array data (see Gridded Datasets). We can also compose elements containing array data with those containing tabular data. To illustrate, let’s pass our tabular station data to a Points
element, which is used to mark positions in two-dimensional space:
points = hv.Points(station_info, ['lon','lat']).opts(color="red")
image + image * points
On the left, we have the visual representation of the image
object we declared. Using +
we put it into a Layout
together with a new compositional object created with the *
operator called an Overlay
. This particular overlay displays the station positions on top of our image, which works correctly because the data in both elements exists in the same space, namely New York City.
The .opts()
method call for specifying the visual style is part of the HoloViews options system, which is described in the next ‘Getting started’ section.
This overlay on the right lets us see the location of all the subway stations in relation to our midnight taxi dropoffs. Of course, HoloViews allows you to visually express more of the available information with our points. For instance, you could represent the ridership of each subway by point color or point size. For more information see Applying Customizations.
Effortlessly exploring data#
You can keep composing datastructures until there are more dimensions than can fit simultaneously on your screen. For instance, you can visualize a dictionary of Images
(one for every hour of the day) by declaring a HoloMap
:
dictionary = {int(hour):hv.Image(arr, ['lon','lat'], bounds=bounds)
for hour, arr in taxi_dropoffs.items()}
hv.HoloMap(dictionary, kdims='Hour')