Attila Tóth is a developer advocate at ScyllaDB. He writes tutorials and blog posts, speaks at events, creates demos and sample applications to help developers build high-performance applications.
Read more from Attila Toth
Curious about what’s behind a video-streaming app? Then join me in exploring a minimal design with the most essential video-streaming application features:
I’ll cover the sample video-streaming application’s tech stack and then zero in on its data-modeling process. The project is available on GitHub. And here’s a video so you can watch, if you prefer:
ScyllaDB is a low-latency and high-performance NoSQL database that’s compatible with Apache Cassandra and DynamoDB. It is well-suited to handle the large-scale data storage and retrieval requirements of video-streaming applications. ScyllaDB has drivers in all the popular programming languages and, as this sample application demonstrates, it integrates well with modern web development frameworks like Next.js.
Low latency in the context of video-streaming services is crucial for delivering a seamless user experience. To lay the groundwork for high performance, you need to design a data model that fits your needs. Let’s continue with an example of the data-modeling process to see what that looks like.
In the ScyllaDB University Data Modeling course, we teach that NoSQL data modeling should always start with your application and queries first. Then you work backward and create the schema based on the queries you want to run in your app. This process ensures that you create a data model that fits your queries and meets your requirements.
With that in mind, let’s go over the queries that our video-streaming app needs to run on each page load.
On this page, you can list all the videos you’ve started to watch. This view includes the video thumbnails and the progress bar under the thumbnail.
SELECT video_id, progress FROM watch_history WHERE user_id = ? LIMIT 9;
For this query, it makes sense to define user_id
as the partition key because that is the filter we use to query the watch history table. Keep in mind that this schema might need to be updated later if there is a query that requires filtering on other columns beyond the user_id
. For now, though, this schema is correct for the defined query.
Besides the progress value, the app also needs to fetch the actual metadata of each video (for example, the title and the thumbnail image). For this, the video
table has to be queried.
SELECT * FROM video WHERE id IN ?;
Notice how we use the “IN” operator and not “=” because we need to fetch a list of videos, not just a single video.
For the video table, let’s define the id
as the partition key because that’s the only filter we use in the query.
If you click on any of the “Watch” buttons, you will be redirected to a page with a video player where you can start and pause the video.
SELECT * FROM video WHERE id=?;
This is a similar query to the one that runs on the Continue Watching page. Thus, the same schema will work for this query as well.
Finally, let’s break down the Most Recent Videos page, which is the home page of the application. We analyze this page last because it is the most complex one from a data-modeling perspective. This page lists 10 of the most recently uploaded videos that are available in the database, ordered by the video creation date.
We will have to fetch these videos in two steps: first, get the timestamps, and then get the actual video content.
SELECT id, top10(created_at) AS date FROM recent_videos;
You might notice that we use a custom function called top10()
. This is not a standard function in ScyllaDB. It’s a user-defined function (UDF) that we created to solve this data modeling problem. This function returns an array of the most recent created_at
timestamps in the table. Creating a new UDF in ScyllaDB can be a great way to solve your unique data-modeling challenges.
These timestamp values can then be used to query the actual video content that we want to show on the page.
SELECT * FROM recent_videos WHERE created_at IN ? LIMIT 10;
In the Recent Videos materialized view, the created_at
column is the primary key, because we filter by that column in our first query to get the most recent timestamp values. Be aware that in some cases, this can cause a hot partition.
Furthermore, the UI also shows a small progress bar under each video’s thumbnail that indicates the progress you made watching that video. To fetch this value for each video, the app has to query the watch history
table.
SELECT progress FROM watch_history WHERE user_id = ? AND video_id = ?;
You might have noticed that the watch history table was already used in a previous query to fetch data. This time, the schema has to be modified slightly to fit this query. Let’s add video_id
as a clustering key. This way, the query to fetch watch progress will work correctly.
That’s it. Now let’s see the final database schema!
This UDF uses Lua, but you could also use WASM to create UDFs in ScyllaDB. While creating the function, make sure to enable UDFs in the scylla.yaml configuration file (location: /etc/scylla/scylla.yaml):
To get started, clone the repository:
git clone https://github.com/scylladb/video-streaming
Install the dependencies:
npm install
Modify the configuration file:
Migrate the database and insert sample data:
npm run migrate
Run the server:
npm run dev
We hope that you enjoyed our video-streaming app and it helped you build low-latency and high-performance applications with ScyllaDB. If you want to continue learning, check out ScyllaDB University where we have free courses on data modeling, ScyllaDB drivers and much more. If you have questions about the video-streaming sample app or ScyllaDB, go to our forum and let’s discuss!
More ScyllaDB sample applications:
Relevant resources: