Visualising cloud streaming with Gephi

To create Radiojar, we’ve developed our own cloud streaming platform. We did it to create a better offering in radio streaming. It allows us to be cost-effective by expanding our streaming network on demand so that it matches the required bandwidth and server load needed by the current number of listeners at any station, at any time. When demand slows down, we’re shutting down the extra nodes that are no longer needed. That’s how we can manage offering unlimited listeners in all our plans. That’s why your radio station will never become unreachable and you’ll only get to pay for the actual bandwidth you use, not for an arbitrary number of listener “slots” that are sitting around untouched.

In this post, we’ll try to give an outline of how this cloud hosting network is set up to work, and how we manage to visualise and troubleshoot it. So, if you noticed one of our recent tweets:

Cool graphical representation of distributed traffic among server nodes and live streams @radiojar @Gephi pic.twitter.com/cnhrMffNV8
— StathisKoutsogeorgos (@stathiskout)

…the rest of this post is the explanation behind it.

On Radiojar’s streaming network cloud, we’re using two kinds of nodes:

The radio servers are the nodes responsible for creating the actual stream that your radio station broadcasts. They do that by mixing different audio components into a single audio stream. Such components can be: a direct link broadcast from a console, a pilot playing songs from your media library, a jingle started by a break, or the DJ’s voice coming form their microphone through Radiojar’s virtual studio. Radio servers are internal nodes, invisible outside Radiojar. They deliver the stream of one or more radio stations to one or more streaming servers for distribution.

The streaming servers are the servers that broadcast a radio station’s stream (as produced by the radio server) to the actual listeners. There are obviously multiple streaming nodes. Their number varies based on the listener demand on the entire Radiojar network at any given time.

Streaming servers form what we call a “broadcluster” (from broadcasting + cluster). Each streaming server is a broadcluster node, receiving new listener requests as distributed by a load balancer. The broadcluster as a whole is essentially a distributed network of intelligent streaming servers, cooperating in real-time to maintain an uninterrupted listening experience for all stations. As demand increases for a station’s stream, more streaming nodes are created in the broadcluster to accommodate the new listeners. As listeners leave, broadcluster nodes are released.

Setting up and debugging a distributed system like this in real-life conditions is no easy feat. All the nodes are running a RESTful web service through which we can check their status, but printing a huge table of figures is not an easy way to check if everything is working as intended. We needed a way to visualise what is happening between these nodes and to check that the listener and stream distribution worked as intended.

We started off by using a graph manipulation library together with a graph visualization library, to produce a visual representation of the network graph and print some figures to show the traffic between the different nodes. The results however were pretty basic. Then we stumbled on Gephi and the demo video blew our minds. Using the Graph Streaming plugin, Gephi can act as a graph streaming server. With a little extra code to process the graph, we can arrive (after a very short processing task) at an image like the following, which shows the streaming and radio server nodes inside a broadcluster:

(view full-size image)

The radio servers are the small bright green circles (emphasized here with black dots) and the streaming servers are the big yellow circles. The yellow circles’ size reflects their load. All the other coloured circles are not server nodes, but visualise the listener load for radio stations: each colour designates a different radio station, and the bigger the circle, the more listeners connected to that station. The lines to the streaming servers show the bandwidth served from each streaming server: the thicker the line, the more listeners are connected to that server. As you can see, each radio station’s stream is usually relayed by at least two streaming servers. For each radio station there’s always a master streaming server (the first streaming server that is requested to relay the specific station’s stream becomes the master) and then subsequent listener requests may be handled by the same server, or routed to other servers that become slaves for the specific station (in this picture all streams are transmitted by a maximum of two servers, but the system will scale up if more listeners join). You’ll notice that low-traffic stations operate with only one streaming server. When there are two or more streaming servers, you can also notice the thin lines that represent the specific station’s signal being relayed from the master streaming server to the slave (they connect streaming servers and are the same colour as the listener load circles & lines).

This setup allows us to check our streaming cloud’s status and load at any time. Also, when deploying changes, this visualisation allows us to check almost in real time that the new code solves the issues we detected and is not causing new problems: inconsistencies and imbalances are immediately obvious!

Posted: 12 years ago

development

About