Networks#

The point and line-segment plotting provided by Datashader can be put together in different ways to visualize specific types of data. For instance, network graph data, i.e., networks of nodes connected by edges, can very naturally be represented by points and lines. Here we will show examples of using Datashader’s graph-specific plotting tools, focusing on how to visualize very large graphs while allowing any portion of the rendering pipeline to replaced with components suitable for specific problems.

First, we’ll import the packages we are using and demonstrating here.

import math
import numpy as np
import pandas as pd

import datashader as ds
import datashader.transfer_functions as tf
from datashader.layout import random_layout, circular_layout, forceatlas2_layout
from datashader.bundling import connect_edges, hammer_bundle

from itertools import chain

Graph (node) layout#

Some graph data is inherently spatial, such as connections between geographic locations, and these graphs can simply be plotted by connecting each location with line segments. However, most graphs are more abstract, with nodes having no natural position in space, and so they require a “layout” operation to choose a 2D location for each node before the graph can be visualized. Unfortunately, choosing such locations is an open-ended problem involving a complex set of tradeoffs and complications.

Datashader provides a few tools for doing graph layout, while also working with external layout tools. As a first example, let’s generate a random graph, with 100 points normally distributed around the origin and 20000 random connections between them:

np.random.seed(0)
n=100
m=20000

nodes = pd.DataFrame(["node"+str(i) for i in range(n)], columns=['name'])
nodes.tail()

	name
95	node95
96	node96
97	node97
98	node98
99	node99

edges = pd.DataFrame(np.random.randint(0,len(nodes), size=(m, 2)),
                     columns=['source', 'target'])
edges.tail()

	source	target
19995	95	22
19996	16	17
19997	10	17
19998	61	69
19999	56	23

Here you can see that the nodes list is a columnar dataframe with an index value and name for every node. The edges list is a columnar dataframe listing the index of the source and target in the nodes dataframe.

To make this abstract graph plottable, we’ll need to choose an x,y location for each node. There are two simple and fast layout algorithms included:

circular  = circular_layout(nodes, uniform=False)
randomloc = random_layout(nodes)
randomloc.tail()

	name	x	y
95	node95	0.136567	0.983925
96	node96	0.834612	0.820818
97	node97	0.219481	0.075901
98	node98	0.212844	0.965857
99	node99	0.492800	0.226973

cvsopts = dict(plot_height=400, plot_width=400)

def nodesplot(nodes, name=None, canvas=None, cat=None):
    canvas = ds.Canvas(**cvsopts) if canvas is None else canvas
    aggregator=None if cat is None else ds.count_cat(cat)
    agg=canvas.points(nodes,'x','y',aggregator)
    return tf.spread(tf.shade(agg, cmap=["#FF3333"]), px=3, name=name)

tf.Images(nodesplot(randomloc,"Random layout"),
          nodesplot(circular, "Circular layout"))

Random layout

Circular layout

The circular layout provides an option to distribute the nodes randomly along the circle or evenly, and here we’ve chosen the former.

The two layouts above ignore the connectivity structure of the graph, focusing only on the nodes. The ForceAtlas2 algorithm is a more complex approach that treats connections like physical forces (a force-directed approach) in order to construct a layout for the nodes based on the network connectivity:

%time forcedirected = forceatlas2_layout(nodes, edges)
tf.Images(nodesplot(forcedirected, "ForceAtlas2 layout"))

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

CPU times: user 6.55 s, sys: 533 ms, total: 7.08 s
Wall time: 8.52 s

ForceAtlas2 layout