Pandas Conversion

Pandas is one of the most popular Python libraries providing high-performance, easy-to-use data structures and data analysis tools. It also provides I/O interfaces to store and load your data in a variety of formats, including CSV files, JSON, Python pickles, and even databases. In other words it makes loading data, munging data, and even complex data analysis tasks a breeze.

Combining the high-performance data analysis tools and I/O capabilities that Pandas provides with the interactivity and ease of generating complex visualization in HoloViews makes the two libraries a perfect match.

In this tutorial we will explore how you can easily convert between Pandas dataframes and HoloViews components. The tutorial assumes you are already familiar with some of the core concepts of both libraries, so if you need more background on HoloViews have a look at the Introduction and Exploring Data and Columnar Data tutorials.

Basic conversions

In [1]:
import numpy as np
import pandas as pd
import holoviews as hv
from IPython.display import HTML
HoloViewsJS successfully loaded in this cell.
In [2]:
%output holomap='widgets'

The first thing to understand when working with pandas dataframes in HoloViews is how data is indexed. Pandas dataframes are structured as tables with any number of columns and indexes. HoloViews, on the other hand, deals with Dimensions. HoloViews container objects such as HoloMap , NdLayout , GridSpace and NdOverlay have kdims, which provide metadata about the data along that dimension and how they can be sliced. Element objects, on the other hand, have both key dimensions ( kdims ) and value dimensions ( vdims ). The kdims of a HoloViews datastructure represent the position, bin or category along a particular dimension, while the value dimensions usually represent some continuous variable.

Let's start by constructing a Pandas dataframe of a few columns and display it as its HTML format (throughout this notebook we will visualize the dataframes using the IPython HTML display function, to allow this notebook to be tested automatically, but in ordinary work you can visualize dataframes directly without this mechanism).

In [3]:
df = pd.DataFrame({'a':[1,2,3,4], 'b':[4,5,6,7], 'c':[8, 9, 10, 11]})
a b c
0 1 4 8
1 2 5 9
2 3 6 10
3 4 7 11

Now that we have a basic dataframe, we can wrap it in the HoloViews Table Element:

In [4]:
example = hv.Table(df)

The data on the Table Element is accessible via the .data attribute like on all other Elements.

In [5]:
['a', 'b', 'c']

As you can see, we now have a Table, which has a and b as its kdims and c as its value_dimension. Because it is not needed by HoloViews, the index of the original dataframe was dropped, but if the indexes are meaningful you make that column available using the .reset_index method on the pandas dataframe:

In [6]:
index a b c
0 0 1 4 8
1 1 2 5 9
2 2 3 6 10
3 3 4 7 11

Now we can employ the HoloViews slicing semantics to select the desired subset of the data and use the usual compositing + operator to lay the data out side by side:

In [7]:
example[:, 4:8:2] + example[2:5:2, :]

Dropping and reducing columns

The above was the simple case: we converted all the dataframe columns to a Table object. Where pandas excels, however, is making a large set of data available in a form that makes selection easy. This time, let's only select a subset of the Dimensions.

In [8]:'a', 'b', [])

As you can see, HoloViews simply ignored the remaining Dimension. By default, the conversion functions ignore any numeric unselected Dimensions. All non-numeric Dimensions are converted to Dimensions on the returned HoloMap, however. Both of these behaviors can be overridden by supplying explicit map dimensions and/or a reduce_fn.

You can perform this conversion with any type and lay your results out side by side, making it easy to look at the same dataset in any number of ways.

In [9]:
%%opts Curve [xticks=3 yticks=3]'a', 'b', []) + example

Finally, we can convert all homogenous HoloViews types (i.e. anything except Layout and Overlay) back to a pandas dataframe using the dframe method.

In [10]:
a b c
0 1 4 8
1 2 5 9
2 3 6 10
3 4 7 11

Download this notebook from GitHub (right-click to download).