Serving Bandwidth-Friendly Video with HTTP Live Streaming (HLS) • Flailing Wildly

While YouTube is free (as in money) to use, the cost is paid in terms of privacy and advertising analytics. So I've decided to investigate self-hosting my video content.

The Cost of YouTube

With YouTube, you sacrifice privacy in favor of cost. YouTube is the very best at what they do (serve video to all resolutions and bandwidths), and they are backed by Google who is the very best at what they do (collect data in order to facilitate selling a primed audience to advertisers).

There’s nothing inherently wrong with that. We live in a capitalistic society; there is money to be made; Google/YouTube is providing a service to advertisers; many consumers will (knowingly or unknowingly) give up their privacy in exchange for free-as-in-money services.

But as I’ve gotten older and started to realize just how much data Google has on each and every one of us, I’ve started valuing my privacy a lot more. I’d like to provide an option for you to protect your privacy as well.

Self-Hosting Video Content

Even with efficient video codecs, video can still cost a lot of money to serve.

Many websites provide a video to their users, wherein this video is a single file, and the browser will begin loading and playing the video from start to finish. This means that even if the user only watches the first few seconds of a 5 minute video, it’s possible that the video is downloaded in its entirety — which is an unnecessary cost.

However, we can provide a better user experience as well as reduce hosting costs by leveraging the ability to serve bandwidth-adaptive chunks of video to players on-demand.

Adaptive Bitrate Streaming

There are two major, semi-compatible approaches to adaptive bitrate streaming over HTTP. One is called HTTP Live Streaming (“HLS”), and the other is called Dynamic Adaptive Streaming over HTTP (“MPEG-DASH”).

From Wikipedia:

Adaptive bitrate streaming is a technique used in streaming multimedia over computer networks. While in the past most video or audio streaming technologies utilized streaming protocols such as RTP with RTSP, today’s adaptive streaming technologies are almost exclusively based on HTTP and designed to work efficiently over large distributed HTTP networks such as the Internet.

It works by detecting a user’s bandwidth and CPU capacity in real time and adjusting the quality of the media stream accordingly. It requires the use of an encoder which can encode a single source media (video or audio) at multiple bit rates. The player client switches between streaming the different encodings depending on available resources. “The result: very little buffering, fast start time and a good experience for both high-end and low-end connections.” […]

HTTP-based adaptive bitrate streaming technologies yield additional benefits over traditional server-driven adaptive bitrate streaming. First, since the streaming technology is built on top of HTTP, contrary to RTP-based adaptive streaming, the packets have no difficulties traversing firewall and NAT devices. Second, since HTTP streaming is purely client-driven, all adaptation logic resides at the client. This reduces the requirement of persistent connections between server and client application. Furthermore, the server is not required to maintain session state information on each client, increasing scalability. Finally, existing HTTP delivery infrastructure, such as HTTP caches and servers can be seamlessly adopted.

A scalable CDN is used to deliver media streaming to an Internet audience. The CDN receives the stream from the source at its Origin server, then replicates it to many or all of its Edge cache servers. The end-user requests the stream and is redirected to the “closest” Edge server. […] The use of HTTP-based adaptive streaming allows the Edge server to run a simple HTTP server software, whose licence cost is cheap or free, reducing software licensing cost, compared to costly media server licences (e.g. Adobe Flash Media Streaming Server). The CDN cost for HTTP streaming media is then similar to HTTP web caching CDN cost.

This means that we can use off-the-shelf services like Amazon S3 and Amazon CloudFront to serve video, which are relatively inexpensive and have large user-bases who can answer questions when you run into issues.

HTTP Live Streaming (HLS)

After doing some research, I came across a blog post that was particularly helpful — “Self-hosted videos with HLS” by Vincent Bernat.

Vincent writes:

To serve HLS videos, you need three kinds of files:

the media segments (encoded with different bitrates/resolutions),

a media playlist for each variant, listing the media segments, and

a master playlist, listing the media playlists.

Media segments can come in two formats:

MPEG-2 Transport Streams (TS), or

Fragmented MP4.

Fragmented MP4 media segments are supported since iOS 10. They are a bit more efficient and can be reused to serve the same content as MPEG-DASH (only the playlists are different). Also, they can be served from the same file with range requests. However, if you want to target older versions of iOS, you need to stick with MPEG-2 TS.

At the time of this writing, iOS 12 will be out in a week or two. A quick search tells me that iOS 10 and newer make up 85% of all iOS users. This means that I can pretty safely use the Fragmented MP4 method which, according to these sources, is more compatible with MPEG-DASH for some cross-over implementations in the future.

Sample Video

Source: Hallelujah - Brooklyn Duo (Piano + Cello)

Implementation

Encoding and Deploying Video

Vincent Bernat provides a tool on GitHub which greatly simplifies the process of creating the various video fragments called video2hls.

For this website, I have put together a workflow for creating and serving HLS video content.

I use H.264 video with AAC audio wrapped inside an MP4 container, exclusively. These are all defined as part of the MPEG-4 specification, and is the best-supported grouping of codecs and containers across all browsers and devices.

Hardware-level decoders are commonplace inside computers, phones, tablets, and set-top boxes like Xbox, PlayStation, and Apple TV.
I have a directory called streaming-video, which is separate from the images that I use and push to S3. Video files are large, and I don’t want to accidentally push partially-completed video data to my caching CDN before they’re ready.
I have a command which takes any video file inside the streaming-video folder, with a filename ending in -source.mp4, and passes it through video2hls, creating a folder called {video}.fmp4 which contains all of the video and playlist files I need across a large variety of bandwidths and resolutions.

It will only do the work to create the directory and all of the fragmented files if the directory doesn’t already exist.
```
find ./streaming-video -type f -name "*-source.mp4" | xargs -I {} \
    bash -c 'if [ ! -d "${1%-source.mp4}.fmp4" ]; then \
        video2hls --debug --output "${1%-source.mp4}.fmp4" --hls-type fmp4 "$1"; \
    fi;' _ {} \;
```

I find all of the .m3u8 playlist files and gzip them (since they’re just text). This is essentially an in-place rewrite of the files.

find ./streaming-video -type f -name "*.m3u8" | xargs -P 8 -I {} \
    bash -c '! gunzip -t $1 2>/dev/null && gzip -v $1 && mv -v $1.gz $1;' _ {} \;

Lastly, I push all of the files up to the hls folder in my S3 bucket using the AWS Unified CLI Tools, setting the correct Content-Type and Content-Encoding headers.

# The .m3u8 playlists that we gzipped
aws s3 sync ./streaming-video s3://blog.ryanparman.com/hls \
    --exclude '*.*' \
    --include '*.m3u8' \
    --acl=public-read \
    --cache-control max-age=31536000,public \
    --content-type 'application/vnd.apple.mpegurl' \
    --content-encoding 'gzip'

# The video "posters"
aws s3 sync ./streaming-video s3://blog.ryanparman.com/hls \
    --exclude '*.*' \
    --include '*.jpg' \
    --acl=public-read \
    --cache-control max-age=31536000,public \
    --content-type 'image/jpeg'

# The fragmented MP4 files
aws s3 sync ./streaming-video s3://blog.ryanparman.com/hls \
    --exclude '*.*' \
    --include '*.mp4' \
    --acl=public-read \
    --cache-control max-age=31536000,public \
    --content-type 'video/mp4'

The Client-Side Code

After pushing the content to our CDN, we can use the standard HTML5 <video> tag to tell browsers how to load the requested assets.

<video poster="https://cdn.ryanparman.com/hls/hallelujah.fmp4/poster.jpg" controls preload="none">
    <source src="https://cdn.ryanparman.com/hls/hallelujah.fmp4/index.m3u8" type="application/vnd.apple.mpegurl">
    <source src="https://cdn.ryanparman.com/hls/hallelujah.fmp4/progressive.mp4" type='video/mp4; codecs="avc1.4d401f, mp4a.40.2"'>
</video>

Here, we have a static poster image that the <video> element loads by default.
Next, we have an HLS-compatible playlist file (.m3u8), which ultimately points to the correct .mp4 files.
Lastly, we have a standard .mp4 fallback.

Enabling Chrome, Firefox, and Edge using hls.js

Dailymotion has released a JavaScript library called hls.js which enables HLS playback on browsers like Chrome, Firefox, and Edge using Fragmented MP4 sources.

You can load the script from the CDN:

<script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script>

After that, we have the implementation. Here, we start with a working <video> element, then use JavaScript to swap over to HLS.

(() => {
  'use strict';

  if (Hls.isSupported()) {
    let selector = "video source[type='application/vnd.apple.mpegurl']",
        videoSources = document.querySelectorAll(selector);

    videoSources.forEach(videoSource => {
      let m3u8 = videoSource.src,
          once = false;

      // Clone the video to remove any source
      let oldVideo = videoSource.parentNode,
          newVideo = oldVideo.cloneNode(false);

      // Replace video tag with our clone.
      oldVideo.parentNode.replaceChild(newVideo, oldVideo);

      // On play, initialize hls.js, once.
      newVideo.addEventListener('play', () => {
        if (once) {
          return;
        };
        once = true;

        var hls = new Hls({
          capLevelToPlayerSize: false
        });
        hls.attachMedia(newVideo);
        hls.loadSource(m3u8);
        hls.on(Hls.Events.MANIFEST_PARSED, (event, data) => {
          newVideo.play();
        });
      }, false);
    });
  }
})();

CORS

If you are serving the files from a third-party host (such as Amazon S3), you will need to enable CORS support on your bucket.

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <CORSRule>
    <AllowedHeader>*</AllowedHeader>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>HEAD</AllowedMethod>
  </CORSRule>
</CORSConfiguration>

Additionally, if you have a CDN cache in front of that S3 bucket (e.g., Amazon CloudFront), you’ll need to make sure that it is configured to allow the Origin headers through and also respond to the HTTP OPTIONS verb.

You can find more information about solving this problem with CloudFront at “Configuring CloudFront to Respect CORS Settings”.