Infocenter

Improve Broadcast Reliability with a Customized Cloud Strategy

Cloud_Implementation_Strategy.png

Defining and implementing a tailored broadcast cloud strategy can lead to improved uptime and lower overall operating costs if done right.

Last October, SMPTE held their  Advanced Technology Conference in Los Angeles. For an industry in the midst of change, it's essential for all of us to keep up with the advances and understand the challenges that lie ahead. Events like the SMPTE Advanced Technology Conference provide a wonderful opportunity to hear from the leaders in our industry, understand where we're headed and what technologies will help us get there. If participation in such events is any indicator of the health of our industry then the future looks rosy as this event had the best attendance in over a decade.

"The cloud" was a major topic at the conference. Al Kovalick, one of the most brilliant thinkers in our industry, spoke of the public cloud as gravity. His premise is that the amount of energy being expended in developing and deploying cloud technologies dwarfs similar spending in other industries. Just like planets and stars whose sheer mass and gravity causes them to clear their orbits, the cloud will do the same to non-cloud industries and technologies. He's right, and yet I can't help being a little scared. After all, broadcasting has survived the last century by developing our own technologies and infrastructures to support our mission. Now we're going to run our business on someone else's infrastructure and depend on another industry's technology?

How Broadcast Approaches New Technologies
But that's not really the way it's been. We've always adapted the technologies of other industries. The phone company provided all of the interconnects for radio and TV networks before there were dedicated satellites, and satellites were also developed initially for telecommunications.satellite.jpg

We've been adapting IT technologies for decades from servers to MPLS networks.  Al Kovalick is right - the  move to the cloud seems unstoppable. If making the cloud work for broadcasting is the next step then the challenge is finding the right implementation strategy.

Define the Cloud
Before we dig in any further let's define what we mean by "the cloud" because it can mean different things to different people. The cloud is a shared pool of computer processing and storage that can be dynamically assigned to different users and tasks. A cloud could be public such as AWS (Amazon Web Services) where the infrastructure is provided by others, or private where the hardware is owned by the customer. And a private cloud could be located on the customer's premises or in a third-party data center. The applications that run in the cloud run on virtual machines and are tailored for this environment. In this article, I'm using the cloud to refer to running on 3rd-party services such as Amazon Web Services and Microsoft’s Azure

Schedule a Meeting with BroadStream at NAB, Booth N6315

 

cloud_computing.jpgThere are several inherent benefits to the cloud:

  • Customers can outsource their infrastructure costs to a 3rd party or facility. This moves expenses from capex to opex and reduces the upfront costs.
  • Many broadcast workflow tasks such as editing, transcoding and graphics rendering are CPU intensive and require significant processing power. Yet these are sporadic actions and, most of the time, the hardware dedicated to these processes sits idle. This makes these tasks ideal for virtualized operations where the computer processing power can be used for other tasks - or sold to other users in a public cloud - when not needed.
  • A cloud, by definition, is connected to a number of locations and users. This makes the cloud an enabler for collaborative workflows where multiple users can share content  and contribute during the process. This also makes the cloud an ideal platform for media management and specific workflows like newsroom automation.

For playout, private clouds can provide the same reliability as standard dedicated systems while delivering some additional benefits. In my first article, I noted that the cloud services that we use most often today are not really optimized for real-time linear channel playout. They seem to lack the reliability we expect for such broadcast operations, and we have little control over the design of the infrastructure or its location. What if we could improve this reliability and use the public cloud for playout?

Making the Cloud Reliable

We often talk about how resilient a playout system is. Resilience is a measure of the ability of a system to cope with Not_Resilient.png
change - how well something bounces back from an impact instead of failing.  As an example, a car bumper is resilient if it retains its actual shape after an impact instead of being permanently deformed. Likewise a playout system is resilient if it can survive a system failure without impacting on-air performance.

In broadcasting we refer to 5-9's as a measure of resilience - 99.999% uptime, which equates to total outage in a year of 5.25 minutes.  SLAs - Service Level Agreements - are also often measured in x-9's.

How can you use the cloud today to provide playout resiliency if you don’t trust the cloud? An ideal application for cloud playout is disaster recovery (DR). As an example, let's assume your main playout system fails four times a year and each time takes you off air for an hour (let's hope not, but that's why we want a DR site). Your total downtime in a year is four hours, and your playout system availability is 99.95%, well short of our target 5-9s.

For our example we'll run a parallel backup DR playout system in the cloud where we'll predict the service is even more unreliable - it fails every other month for 6 hours at a time, yielding a downtime of 36 hours per year or an availability of 99.5%. If my system automatically switches between these two systems for failover, assuming no failover latency, having both systems available increases my uptime to an outstanding  99.9999% (6-9s) and our total downtime is less than one minute. As this example shows, a component with less reliability can actually improve our operational availability.

6-9s is outstanding reliability, but that assumes that the DR service is running full-time, continuously in the cloud and this might not be any more economical than building your own off-site disaster recovery facility. It also assumes instantaneous failover, which is unlikely. What if instead of full-time availability, the cloud DR service was on standby, and we'd only spin it up when it was actually needed? If we could do that we'd save a lot compared to building our own DR site (CAPEX) or paying to run it continuously on someone else's service (OPEX).

Let's assume that we can spin up the cloud service in under 1 minute by caching content and playlists ahead of time in the cloud; you could already be using the cloud for non-realtime processes like playlist and media management anyway, so this isn’t unrealistic. Even with the 1-minute downtime per instance and 4 failures per year, with this standby cloud playout solution you would have under 5 minutes of total outage per year, achieving our target of 5-9's. And, you would only need the cloud playout for less than 5 hours per year which, if it was shared by several stations or customers, would make the cost a mere fraction of what a fulltime DR system would cost.

Finally, let's look at what happens if we forgo our main on-site playout system and move playout completely to the cloud. Using the same playout in the cloud with insufficient reliability - it fails every other month for 6 hours at a time, yielding a downtime of 36 hours per year or an availability of 99.5% - isn't good enough by itself. But what if we have a completely independent backup system running in a different cloud, from a different provider, in a different location and using different transmission paths and routing? These two systems each with the same poor 99.5% availability, when  combined even with a 1-minute cutover time, theoretically yield 99.997% availability with 15 minutes of downtime per year. And a 3rd backup system would bring us to a full 5-9's, even if it's on standby and not full-time. So, the key to finding the reliability we need from the cloud really depends on our implementation strategy.

In my follow up article to this piece we will dive deeper into the math and discuss how we might map out a specific strategy for each customer depending on their specific and unique needs.

February 02, 2016 Blog

About the Author: Peter Wharton

Peter Wharton

Specialties:Broadcast facility design, video IP networking, master control and graphics centralization, facility monitoring and control, file-based workflow solutions.

Subscribe to Email Updates