# Exploring Data ¶

In the Introductory Tutorial and the Element and Container overviews you can see how HoloViews allows you to wrap your data into annotated  Element  s that can be composed easily into complex visualizations.

In this tutorial, we will see how all of the data you want to examine can be embedded as  Elements  into a nested, sparsely populated, multi-dimensional data structure that gives you maximum flexibility to slice, select, and combine your data for visualization and analysis. With HoloViews objects, you can visualize your multi-dimensional data as animations, images, charts, and parameter spaces with ease, allowing you to quickly discover the important features interactively and then prepare corresponding plots for reports, publications, or web pages.

We will first start with the very powerful  HoloMap  container, and then show how  HoloMap  objects can be nested inside the other Container objects to make all of your data available easily.

In [1]:
import numpy as np
import holoviews as hv
hv.notebook_extension()
%output holomap='auto'
%timer start

Timer start: 2017/04/28 01:11:47


To start, here are some general imports we will be using, mainly from the Python standard library:

In [2]:
import json
import datetime as dt

from itertools import product

from matplotlib import pyplot as plt
import matplotlib.dates as md

try:
from urllib2 import urlopen
except:
from urllib.request import urlopen
from io import BytesIO


##  HoloMap  Basics ¶

Python users will be familiar with dictionaries as a way to collect data together in a conveniently accessible manner. Unlike NumPy arrays, dictionaries are sparse and heterogeneous and do not have to be declared with a fixed size.

 HoloMap  s are a core part of HoloViews and are essential for generating animated visualizations. They also provide highly useful ways to manipulate your data for display and have several useful properties:

•  HoloMap  s are ordered (internally they use  OrderedDictionary  , or if installed, the optimized  cyordereddict  ).
•  HoloMap  s let you index your data with an arbitrary number of dimensions (e.g.  date  and  batch-number  ), not just one like a Python dictionary.
• The dimensions used may be simple strings, or objects recording the name, type, and physical units of the dimension.
•  HoloMap  s let you select portions of your data by slicing each available dimension independently.
•  HoloMap  s also provide ways to transform your data by sampling, reducing and collapsing the data  Elements  .
•  Dimension  s in a  HoloMap  may be mapped onto parameter spaces for easy visualization of a portion of your multidimensional data space.

In this notebook we will be exploring weather data from Hurricane Sandy, which swept across the Caribbean and the Eastern US seaboard in late October 2012. We will scrape our data from various online sources, exploring not only how we can quickly generate animations using HoloMaps, but also how we can deal with very high-dimensional data.

We've already downloaded and cropped a number of frames of the satellite-imagery-based wind speed models from NASA and cached them on the HoloViews website. If you want to select a different cropping region or sample more frames you can find out how to get the raw data directly from NASA in this Wiki entry . For now, we'll just get the preprocessed data:

In [3]:
iobuffer = BytesIO(urlopen('http://assets.holoviews.org/hurricane.npz').read())
dates = data['dates']
surface_data, nearsrfc_data = data['surface'], data['near_surface']


## Constructing a HoloMap ¶

#### Declaring  Dimension  s ¶

Now that we have loaded the data we can store the raw image arrays as RGB Elements and create a HoloMap. We begin by declaring the key dimensions (  kdims  ) of the HoloMap, which determine how the data will be stored and thus how you will be able to index and select it most easily. In this case we will index our HoloMap both by the frame number and the date:

In [4]:
date_dim = hv.Dimension("Date", value_format=md.DateFormatter('%b %d %Y %H:%M UTC'), type=float)
kdims = ['Frame', date_dim]


 Dimension  s can be specified as a simple string, or as a  Dimension  object with additional information to give HoloViews some hints about how to format and display values along that  Dimension  .

#### Populating the  HoloMap  ¶

Creating a  HoloMap  is just like creating a Python dictionary, and so you can either pass a dictionary object or a list of (key,value) pairs. The keys can each be a single value for a one-dimensional  HoloMap  , or tuples for multiple  Dimension  s.

In [5]:
srfc = [((frame, date), hv.RGB(surface_data[...,frame], bounds=(0, 0)+surface_data.shape[0:2][::-1], xdensity=1,
label='Hurricane Sandy', group='Surface Wind Speed'))
for frame, date in zip(range(len(dates)), dates)]

nsrfc = [((frame, date), hv.RGB(nearsrfc_data[...,frame], bounds=(0, 0)+nearsrfc_data.shape[0:2][::-1], xdensity=1,
label='Hurricane Sandy', group='Near Surface Wind Speed'))
for frame, date in zip(range(len(dates)), dates)]

surface_wind = hv.HoloMap(srfc, kdims=kdims)
nearsurface_wind = hv.HoloMap(nsrfc, kdims=kdims)


Not only is the  HoloMap  constructor similar to Python dictionaries,  HoloMap  s also provide  __getitem__  ,  __setitem__  ,  update  ,  get  ,  pop  ,  keys  ,  values  and  items  just as for normal dictionaries. In addition,  HoloMap  provides a  .clone  method that will return a copy of the  HoloMap  containing the same data, where the data and all the parameters may now be overridden.

## Basic usage and attributes on  HoloMap  s ¶

A  HoloMap  must be uniform in the type,  group  ,  label  , and key dimensions of its  Elements  , because it defines a parameter space of  Elements  varying only in their n-dimensional index and data. This also allows  HoloMaps  to inherit the  value  and  label  of its  Elements  , which we can see by inspecting the  HoloMap   repr()  for  satellite_map  :

In [6]:
print(surface_wind)

:HoloMap   [Frame,Date]
:RGB   [x,y]   (R,G,B)


Since the  RGB  elements we have created are not square we can declare that  RGB   Element  s should be displayed with an aspect ratio of 1.0 using the  %opts  line magic, which will apply to all subsequent cells:

In [7]:
%opts RGB [aspect=1]


To get a quick glimpse at the data we have collected, you can access the  .last  property, which will return the last  Element  in the  HoloMap  :

In [8]:
surface_wind.last

Out[8]:

If you are unsure how large the  HoloMap  is or want to know a bit more about the  Dimension  ranges, you can use the  .info  property. For a  HoloMap  ,  .info  will list the dimensions, their ranges for the key dimensions on the  HoloMap  , and even the  deep_dimensions  , i.e. any  Dimension  s contained within the  Element  s of the  HoloMap  .

In [9]:
surface_wind.info

HoloMap containing 14 items of type RGB
---------------------------------------

Key Dimensions:
Frame: 0...13
Date: Oct 25 2012 01:00 UTC...Nov 02 2012 16:00 UTC
Deep Dimensions:
x: 0.0...400.0
y: 0.0...350.0
R: 0...1
G: 0...1
B: 0...1



## Indexing and slicing  HoloMaps  ¶

Having found out a bit about the  HoloMap  , we can look at a few frames, starting with selecting just the first three:

In [10]:
surface_wind[0:3]

Out[10]:

Because  HoloMap  s support all the slicing semantics including steps, we can do things like select every second frame in the second half of the animation:

In [11]:
surface_wind[7:14:2]

Out[11]:

As you may have noticed, the slices are not simply by whole-number index, as for a numpy array. A  HoloMap  , like all other  Dimensioned  objects (i.e., most HoloViews components), is always sliceable by the values along its key dimensions, in whatever units they are expressed.

Apart from simple slicing semantics, you can also select  Element  s by passing the Dimension values as a set. Since our  Element  s are guaranteed to be uniform, a  HoloMap  also allows deep indexing into the key dimensions of its  Element  s, allowing us to easily select a subregion of each satellite frame (where  :  alone means to select the entire range of that dimension)

In [12]:
surface_wind[{0, 2, 3, 5}, :, 150:350, 50:250]

Out[12]:

Finally let's put together everything we've learned about indexing and go one step further. So far we've been looking at just the surface wind speed plots, but now let's combine them into a  Layout  . Just like  Element  s,  HoloMap  s can be grouped into a  Layout  using the  +  operator. Since the  Layout  is a  Tree  -based data structure it doesn't have any  Dimension  s of its own and we can't use  __getitem__  . Instead we may use  select  , which can be found on all HoloViews components. The  .select  method may be supplied with any number of dimension and value slice pairs. Slices may be supplied either as explicit  slice  objects or as tuples.

In [13]:
(surface_wind + nearsurface_wind).select(Frame=slice(0, 10, 2), x=(150,350), y=(50, 250))

Out[13]:

## Grouping  HoloMap  s ¶

 HoloMap  s provide the starting point to display your data in any number of ways. While  HoloMap  dimensions are displayed as frames of an animation by default, you can easily transform a  HoloMap  into another n-D component type, such as an  NdLayout  ,  GridSpace  , or  NdOverlay  , via the  .layout  ,  .grid  , and  .overlay  methods.

Each of these methods groups the data along the values of the dimensions you specify and return the newly grouped object. These methods are each just convenience methods around the  .groupby  method, which can split a  HoloMap  into whatever container and group types you specify.

Before we can start grouping, however, we hit a snag in our indexing: the Frame and Date dimensions we specified above are redundant, because for each frame there is only one corresponding date. As a result, any  groupby  operation will fail. But we can easily solve this problem by reindexing the  HoloMap  :

In [14]:
print("Dimensions before reindex: %s" % surface_wind.dimensions('key', label=True))
surface_reindexed = surface_wind.reindex(['Date'])
print("Dimensions after reindex:  %s" % surface_reindexed.dimensions('key', label=True))

Dimensions before reindex: ['Frame', 'Date', 'x', 'y']
Dimensions after reindex:  ['Date', 'x', 'y']


Now that we have removed the redundant  Frame  Dimension we can create an NdLayout indexed just by the date:

In [15]:
surface_reindexed[::4].layout('Date')

Out[15]:
In [16]:
%output size=250


For a more compact representation, you may also create a  GridSpace  using the  .grid  method. In a  GridSpace  , each dimension maps onto an axis, which limits it to a maximum of two  Dimension  s, but redundant data like the shared axes and axis labels are suppressed. To avoid the tick labels overlapping we will also define a rotation of the tick marks by a few degrees.

In [17]:
%opts GridSpace [xrotation=10]
surface_reindexed[::2].grid('Date')

Out[17]:

Now how do we go about combining the two  HoloMap  s into a single  GridSpace  ? First let us reindex the near-surface data as well.

In [18]:
nearsurface_reindexed = nearsurface_wind.reindex(['Date'])


The two HoloMaps we have represent wind speed at different heights. Meteorologists state the height of different air masses by their pressure. The near-surface imagery is at 850 hPa, while the surface level images are at 1000 hPa.

In [19]:
height = hv.Dimension('Layer Height', unit='hPa')


We can add this  Dimension  to the  HoloMaps  via the  add_dimension  method, which accepts the new dimension, the index position at which to insert that dimension and the dimension value as arguments:

In [20]:
surface = surface_reindexed.add_dimension(height, 1, 1000)


Now we can combine the two  HoloMap  s by creating a clone and updating it with the other  HoloMap  :

In [21]:
combined_hurricane = surface.clone()
combined_hurricane.update(near_surface)


Using  .info  we can confirm the two  HoloMap  s have been successfully merged.

In [22]:
combined_hurricane.info

HoloMap containing 28 items of type RGB
---------------------------------------

Key Dimensions:
Date: Oct 25 2012 01:00 UTC...Nov 02 2012 16:00 UTC
Layer Height (hPa): 850...1000
Constant Dimensions:
Frame: None...None
Deep Dimensions:
x: 0.0...400.0
y: 0.0...350.0
R: 0...1
G: 0...1
B: 0...1



Merging multiple  HoloMap  s in this step-by-step way would be cumbersome, and avoiding this complexity is why the  Collator  object (another instance of  Dimensioned  ) has been provided.  Collator  will be described in the Columnar Data tutorial.

Now that both the  Date  and  Layer Height  are  Dimension  s on the  HoloMap  we have various options for laying out our data. We can simply map each  Dimension  to an axis of a  GridSpace  :

In [23]:
combined_hurricane.select(Date=(None, None, 2)).grid(['Date', 'Layer Height'])

Out[23]: