Adaptive Streaming with ExoPlayer on Android

Adaptive Streaming is a technique for streaming music / videos on a quality based on available network bandwidth. Slower connections will stream lower quality videos, while faster connections will stream the best quality videos they can stream without taking too long to buffer.

Think of YouTube, when you leave the quality on auto, the YouTube player will automatically switch qualities based on what your network can handle. If your network can handle it, it will go for the full 4k 60fps quality (provided the video can stream in that quality). If your internet is quack however, you’ll probably be given a solid 144p.

These qualities are called tracks. The video is split into different tracks, each for a given quality based on bit rate and resolution.

Each track is split into chunks of 2 to 10 seconds. This makes it easier to split between tracks with changing network speeds and signals.

So what determines the actual size of these chunks?

How media streaming works in a nutshell

Media is delivered in packets in a lossy delivery process. This means that data transfer is faster than it otherwise would be (as opposed to lossless data transfer), but as a consequence, some pixels may be lost in the process. Most of the time, this loss in pixels is unnoticeable. When it comes to streaming, speed is more important than reliability for a good user experience.

A video frame is an image, but if you had 2 seconds of a video running at 30fps, would that mean you have 60 still images? Well, not really, no. That would also result in slow streaming and rather large video files across the board.

Instead, most video formats work using keyframes and delta frames (aka i-frames and p-frames). Keyframes are those ‘still images’ that contain all of the data of the frame, the image. If these were all you’d use however, your video streaming might not be smooth.

Delta frames on the other hand only contain part of the image in the video and looks back at the previous frame (key or delta) for any redundancies. The amount of image presented in the delta frame is based on how much new information there is to present.

This is why for videos of the same quality and frame rate, a video with many moving parts may take longer to load than a news reporter sitting down and talking in front of a still background.

And all of this is why a chunk may be 2 or 10 seconds.

Implementing Adaptive Track Selection

Adaptive streaming is done via a TrackSelector that you add to your player when you initialise it.

val trackSelector = DefaultTrackSelector(this)
trackSelector.setParameters(trackSelector.buildUponParameters().setMaxVideoSizeSd())

player = SimpleExoPlayer.Builder(this)
        .setTrackSelector(trackSelector)
        .build()

The TrackSelector is responsible for switching between tracks while streaming.

setMaxVideoSizeSd() tells the track selector to only pick tracks of standard definition or lower, a good way to provide streaming for speed at the expense of quality, and is equivalent to setMaxVideoSize(1279, 719)

If you want to provide high definition streaming, ignore setting the parameters as the default max video size is Integer.MAX_VALUE.

Build an Adaptive Media Source

DASH, HLS, and SmoothStreaming are all media formats ExoPlayer supports that are capable of adaptive streaming, but we’ll focus on DASH for now and use the DashMediaSource.

private fun buildMediaSource(): MediaSource {
    val dataSourceFactory: DataSource.Factory = DefaultDataSourceFactory(this, "exoplayer-codelab")
    val mediaSourceFactory = DashMediaSource.Factory(dataSourceFactory)
    val uri = Uri.parse(getString(R.string.media_url_dash))
    return mediaSourceFactory.createMediaSource(uri)
}

Here you’re using both a DashMediaSource.Factory and a uri that supports DASH streaming.

And just like that, you’re adaptive streaming.