NoSQL Data Modeling in Practice: Video Streaming

Curious about what’s behind a video-streaming app? Then join me in exploring a minimal design with the most essential video-streaming application features:

I’ll cover the sample video-streaming application’s tech stack and then zero in on its data-modeling process. The project is available on GitHub. And here’s a video so you can watch, if you prefer:

Technology Stack

Using ScyllaDB for Low-Latency Video-Streaming Apps

ScyllaDB is a low-latency and high-performance NoSQL database that’s compatible with Apache Cassandra and DynamoDB. It is well-suited to handle the large-scale data storage and retrieval requirements of video-streaming applications. ScyllaDB has drivers in all the popular programming languages and, as this sample application demonstrates, it integrates well with modern web development frameworks like Next.js.

Low latency in the context of video-streaming services is crucial for delivering a seamless user experience. To lay the groundwork for high performance, you need to design a data model that fits your needs. Let’s continue with an example of the data-modeling process to see what that looks like.

Video-Streaming App Data Modeling

In the ScyllaDB University Data Modeling course, we teach that NoSQL data modeling should always start with your application and queries first. Then you work backward and create the schema based on the queries you want to run in your app. This process ensures that you create a data model that fits your queries and meets your requirements.

With that in mind, let’s go over the queries that our video-streaming app needs to run on each page load.

Page: Continue Watching

On this page, you can list all the videos you’ve started to watch. This view includes the video thumbnails and the progress bar under the thumbnail.

Query — Get Watch Progress:

SELECT video_id, progress FROM watch_history WHERE user_id = ? LIMIT 9;

Schema — Watch History Table:

For this query, it makes sense to define user_id as the partition key because that is the filter we use to query the watch history table. Keep in mind that this schema might need to be updated later if there is a query that requires filtering on other columns beyond the user_id. For now, though, this schema is correct for the defined query.

Besides the progress value, the app also needs to fetch the actual metadata of each video (for example, the title and the thumbnail image). For this, the video table has to be queried.

Query — Get Video Metadata:

SELECT * FROM video WHERE id IN ?;

Notice how we use the “IN” operator and not “=” because we need to fetch a list of videos, not just a single video.

Schema — Video Table:

For the video table, let’s define the id as the partition key because that’s the only filter we use in the query.

Page: Watch Video

If you click on any of the “Watch” buttons, you will be redirected to a page with a video player where you can start and pause the video.

Query — Get Video Content:

SELECT * FROM video WHERE id=?;

This is a similar query to the one that runs on the Continue Watching page. Thus, the same schema will work for this query as well.

Schema — Video Table:

Page: Most Recent Videos

Finally, let’s break down the Most Recent Videos page, which is the home page of the application. We analyze this page last because it is the most complex one from a data-modeling perspective. This page lists 10 of the most recently uploaded videos that are available in the database, ordered by the video creation date.

We will have to fetch these videos in two steps: first, get the timestamps, and then get the actual video content.

Query — Get the Timestamp of the Most Recent 10 Videos:

SELECT id, top10(created_at) AS date FROM recent_videos;

You might notice that we use a custom function called top10(). This is not a standard function in ScyllaDB. It’s a user-defined function (UDF) that we created to solve this data modeling problem. This function returns an array of the most recent created_at timestamps in the table. Creating a new UDF in ScyllaDB can be a great way to solve your unique data-modeling challenges.

These timestamp values can then be used to query the actual video content that we want to show on the page.

Query — Get Metadata for Those Videos:

SELECT * FROM recent_videos WHERE created_at IN ? LIMIT 10;

Schema — Recent Videos:

In the Recent Videos materialized view, the created_at column is the primary key, because we filter by that column in our first query to get the most recent timestamp values. Be aware that in some cases, this can cause a hot partition.

Furthermore, the UI also shows a small progress bar under each video’s thumbnail that indicates the progress you made watching that video. To fetch this value for each video, the app has to query the watch history table.

Query — Get Watch Progress for Each Video:

SELECT progress FROM watch_history WHERE user_id = ? AND video_id = ?;

Schema — Watch History:

You might have noticed that the watch history table was already used in a previous query to fetch data. This time, the schema has to be modified slightly to fit this query. Let’s add video_id as a clustering key. This way, the query to fetch watch progress will work correctly.

That’s it. Now let’s see the final database schema!

Final Database Schema

User-Defined Function for the Most Recent Videos Page

This UDF uses Lua, but you could also use WASM to create UDFs in ScyllaDB. While creating the function, make sure to enable UDFs in the scylla.yaml configuration file (location: /etc/scylla/scylla.yaml):

Clone the Repo and Get Started

To get started, clone the repository:

git clone https://github.com/scylladb/video-streaming

Install the dependencies:

npm install

Modify the configuration file:

Migrate the database and insert sample data:

npm run migrate

Run the server:

npm run dev

Wrapping up

We hope that you enjoyed our video-streaming app and it helped you build low-latency and high-performance applications with ScyllaDB. If you want to continue learning, check out ScyllaDB University where we have free courses on data modeling, ScyllaDB drivers and much more. If you have questions about the video-streaming sample app or ScyllaDB, go to our forum and let’s discuss!

More ScyllaDB sample applications:

Relevant resources:

Group Created with Sketch.

 

 

 

 

Top