Exploring Data

In the Introductory Tutorial and the Element and Container overviews you can see how HoloViews allows you to wrap your data into annotated Element s that can be composed easily into complex visualizations.

In this tutorial, we will see how all of the data you want to examine can be embedded as Elements into a nested, sparsely populated, multi-dimensional data structure that gives you maximum flexibility to slice, select, and combine your data for visualization and analysis. With HoloViews objects, you can visualize your multi-dimensional data as animations, images, charts, and parameter spaces with ease, allowing you to quickly discover the important features interactively and then prepare corresponding plots for reports, publications, or web pages.

We will first start with the very powerful HoloMap container, and then show how HoloMap objects can be nested inside the other Container objects to make all of your data available easily.

In [1]:
import numpy as np
import holoviews as hv
%output holomap='auto'
%timer start
HoloViewsJS successfully loaded in this cell.
Timer start: 2016/10/28 16:53:12

To start, here are some general imports we will be using, mainly from the Python standard library:

In [2]:
import json
import datetime as dt

from itertools import product

from matplotlib import pyplot as plt
import matplotlib.dates as md

    from urllib2 import urlopen
    from urllib.request import urlopen
from io import BytesIO

HoloMap Basics

Python users will be familiar with dictionaries as a way to collect data together in a conveniently accessible manner. Unlike NumPy arrays, dictionaries are sparse and heterogeneous and do not have to be declared with a fixed size.

HoloMap s are a core part of HoloViews and are essential for generating animated visualizations. They also provide highly useful ways to manipulate your data for display and have several useful properties:

  • HoloMap s are ordered (internally they use OrderedDictionary , or if installed, the optimized cyordereddict ).
  • HoloMap s let you index your data with an arbitrary number of dimensions (e.g. date and batch-number ), not just one like a Python dictionary.
  • The dimensions used may be simple strings, or objects recording the name, type, and physical units of the dimension.
  • HoloMap s let you select portions of your data by slicing each available dimension independently.
  • HoloMap s also provide ways to transform your data by sampling, reducing and collapsing the data Elements .
  • Dimension s in a HoloMap may be mapped onto parameter spaces for easy visualization of a portion of your multidimensional data space.

Loading data

In this notebook we will be exploring weather data from Hurricane Sandy, which swept across the Caribbean and the Eastern US seaboard in late October 2012. We will scrape our data from various online sources, exploring not only how we can quickly generate animations using HoloMaps, but also how we can deal with very high-dimensional data.

We've already downloaded and cropped a number of frames of the satellite-imagery-based wind speed models from NASA and cached them on the HoloViews website. If you want to select a different cropping region or sample more frames you can find out how to get the raw data directly from NASA in this Wiki entry . For now, we'll just get the preprocessed data:

In [3]:
iobuffer = BytesIO(urlopen('http://assets.holoviews.org/hurricane.npz').read())
data = np.load(BytesIO(iobuffer.getvalue()))
dates = data['dates']
surface_data, nearsrfc_data = data['surface'], data['near_surface']

Constructing a HoloMap

Declaring Dimension s

Now that we have loaded the data we can store the raw image arrays as RGB Elements and create a HoloMap. We begin by declaring the key dimensions ( kdims ) of the HoloMap, which determine how the data will be stored and thus how you will be able to index and select it most easily. In this case we will index our HoloMap both by the frame number and the date:

In [4]:
date_dim = hv.Dimension("Date", value_format=md.DateFormatter('%b %d %Y %H:%M UTC'), type=float)
kdims = ['Frame', date_dim]

Dimension s can be specified as a simple string, or as a Dimension object with additional information to give HoloViews some hints about how to format and display values along that Dimension .

Populating the HoloMap

Creating a HoloMap is just like creating a Python dictionary, and so you can either pass a dictionary object or a list of (key,value) pairs. The keys can each be a single value for a one-dimensional HoloMap , or tuples for multiple Dimension s.

In [5]:
srfc = [((frame, date), hv.RGB(surface_data[...,frame], bounds=(0, 0)+surface_data.shape[0:2][::-1], xdensity=1,
                                label='Hurricane Sandy', group='Surface Wind Speed'))
        for frame, date in zip(range(len(dates)), dates)]

nsrfc = [((frame, date), hv.RGB(nearsrfc_data[...,frame], bounds=(0, 0)+nearsrfc_data.shape[0:2][::-1], xdensity=1,
                                label='Hurricane Sandy', group='Near Surface Wind Speed'))
        for frame, date in zip(range(len(dates)), dates)]

surface_wind = hv.HoloMap(srfc, kdims=kdims)
nearsurface_wind = hv.HoloMap(nsrfc, kdims=kdims)

Not only is the HoloMap constructor similar to Python dictionaries, HoloMap s also provide __getitem__ , __setitem__ , update , get , pop , keys , values and items just as for normal dictionaries. In addition, HoloMap provides a .clone method that will return a copy of the HoloMap containing the same data, where the data and all the parameters may now be overridden.

Basic usage and attributes on HoloMap s

A HoloMap must be uniform in the type, group , label , and key dimensions of its Elements , because it defines a parameter space of Elements varying only in their n-dimensional index and data. This also allows HoloMaps to inherit the value and label of its Elements , which we can see by inspecting the HoloMap repr() for satellite_map :

In [6]:
:HoloMap   [Frame,Date]
   :RGB   [x,y]   (R,G,B)

Since the RGB elements we have created are not square we can declare that RGB Element s should be displayed with an aspect ratio of 1.0 using the %opts line magic, which will apply to all subsequent cells:

In [7]:
%opts RGB [aspect=1]

To get a quick glimpse at the data we have collected, you can access the .last property, which will return the last Element in the HoloMap :

In [8]:

If you are unsure how large the HoloMap is or want to know a bit more about the Dimension ranges, you can use the .info property. For a HoloMap , .info will list the dimensions, their ranges for the key dimensions on the HoloMap , and even the deep_dimensions , i.e. any Dimension s contained within the Element s of the HoloMap .

In [9]:
HoloMap containing 14 items of type RGB

Key Dimensions: 
	 Frame: 0...13 
	 Date: Oct 25 2012 01:00 UTC...Nov 02 2012 16:00 UTC 
Deep Dimensions: 
	 x: 0...400 
	 y: 0...350 
	 R: 0...1 
	 G: 0...1 
	 B: 0...1 

Indexing and slicing HoloMaps

Having found out a bit about the HoloMap , we can look at a few frames, starting with selecting just the first three:

In [10]:

Because HoloMap s support all the slicing semantics including steps, we can do things like select every second frame in the second half of the animation:

In [11]:

As you may have noticed, the slices are not simply by whole-number index, as for a numpy array. A HoloMap , like all other Dimensioned objects (i.e., most HoloViews components), is always sliceable by the values along its key dimensions, in whatever units they are expressed.

Apart from simple slicing semantics, you can also select Element s by passing the Dimension values as a set. Since our Element s are guaranteed to be uniform, a HoloMap also allows deep indexing into the key dimensions of its Element s, allowing us to easily select a subregion of each satellite frame (where : alone means to select the entire range of that dimension)

In [12]:
surface_wind[{0, 2, 3, 5}, :, 150:350, 50:250]

Finally let's put together everything we've learned about indexing and go one step further. So far we've been looking at just the surface wind speed plots, but now let's combine them into a Layout . Just like Element s, HoloMap s can be grouped into a Layout using the + operator. Since the Layout is a Tree -based data structure it doesn't have any Dimension s of its own and we can't use __getitem__ . Instead we may use select , which can be found on all HoloViews components. The .select method may be supplied with any number of dimension and value slice pairs. Slices may be supplied either as explicit slice objects or as tuples.

In [13]:
(surface_wind + nearsurface_wind).select(Frame=slice(0, 10, 2), x=(150,350), y=(50, 250))

Grouping HoloMap s

HoloMap s provide the starting point to display your data in any number of ways. While HoloMap dimensions are displayed as frames of an animation by default, you can easily transform a HoloMap into another n-D component type, such as an NdLayout , GridSpace , or NdOverlay , via the .layout , .grid , and .overlay methods.

Each of these methods groups the data along the values of the dimensions you specify and return the newly grouped object. These methods are each just convenience methods around the .groupby method, which can split a HoloMap into whatever container and group types you specify.

Before we can start grouping, however, we hit a snag in our indexing: the Frame and Date dimensions we specified above are redundant, because for each frame there is only one corresponding date. As a result, any groupby operation will fail. But we can easily solve this problem by reindexing the HoloMap :

In [14]:
print("Dimensions before reindex: %s" % surface_wind.dimensions('key', label=True))
surface_reindexed = surface_wind.reindex(['Date'])
print("Dimensions after reindex:  %s" % surface_reindexed.dimensions('key', label=True))
Dimensions before reindex: ['Frame', 'Date', 'x', 'y']
Dimensions after reindex:  ['Date', 'x', 'y']

Now that we have removed the redundant Frame Dimension we can create an NdLayout indexed just by the date:

In [15]:
In [16]:
%output size=250 

For a more compact representation, you may also create a GridSpace using the .grid method. In a GridSpace , each dimension maps onto an axis, which limits it to a maximum of two Dimension s, but redundant data like the shared axes and axis labels are suppressed. To avoid the tick labels overlapping we will also define a rotation of the tick marks by a few degrees.

In [17]:
%opts GridSpace [xrotation=10]

Adding Dimensions

Now how do we go about combining the two HoloMap s into a single GridSpace ? First let us reindex the near-surface data as well.

In [18]:
nearsurface_reindexed = nearsurface_wind.reindex(['Date'])

The two HoloMaps we have represent wind speed at different heights. Meteorologists state the height of different air masses by their pressure. The near-surface imagery is at 850 hPa, while the surface level images are at 1000 hPa.

In [19]:
height = hv.Dimension('Layer Height', unit='hPa')

We can add this Dimension to the HoloMaps via the add_dimension method, which accepts the new dimension, the index position at which to insert that dimension and the dimension value as arguments:

In [20]:
surface = surface_reindexed.add_dimension(height, 1, 1000)
near_surface = nearsurface_reindexed.add_dimension(height, 1, 850)

Now we can combine the two HoloMap s by creating a clone and updating it with the other HoloMap :

In [21]:
combined_hurricane = surface.clone()

Using .info we can confirm the two HoloMap s have been successfully merged.

In [22]:
HoloMap containing 28 items of type RGB

Key Dimensions: 
	 Date: Oct 25 2012 01:00 UTC...Nov 02 2012 16:00 UTC 
	 Layer Height (hPa): 850...1000 
Constant Dimensions: 
	 Frame: None...None 
Deep Dimensions: 
	 x: 0...400 
	 y: 0...350 
	 R: 0...1 
	 G: 0...1 
	 B: 0...1 

Merging multiple HoloMap s in this step-by-step way would be cumbersome, and avoiding this complexity is why the Collator object (another instance of Dimensioned ) has been provided. Collator will be described in the Columnar Data tutorial.

Now that both the Date and Layer Height are Dimension s on the HoloMap we have various options for laying out our data. We can simply map each Dimension to an axis of a GridSpace :

In [23]:
combined_hurricane.select(Date=(None, None, 2)).grid(['Date', 'Layer Height'])