Kino: Video Upscaling Using Machine Learning

Introduction

Videos are "time in a bottle" saved away for eternity - the collective memories of the human race from man's first step on the moon to your child's first steps. Legacy videos including Hollywood movies that took millions of dollars to produce, memorable moments in sport history, and home video segments included in biographical productions are challenging to re-broadcast on 4K displays. The low resolution of the original content is not appealing to audiences today because they demand high resolution entertainment. The Kino upscaler addresses this need by allowing content owners to refresh legacy videos for high resolution displays; and thereby extends the useful life, and value, of these videos.
The traditional (non-ML) way to upscale videos is to use an interpolation function to determine the in-between pixels based on the pixels from the original image. If you want to double the size of an image and two neighboring pixels in the original had values 200 and 204, then the in-between pixel in the upscaled image will have a value of 202. However, this approach results in a blurry appearance in detail-rich areas on the image. While there are more sophisticated attempts to solve this problem, all approaches have to deal with the reality that the original video image does not have the information needed to fill-in the pixels in the new high-resolution video. So how does Kino address this issue to produce high quality reconstructions?

Solution

Kino uses machine learning to “understand” the underlying image and then “guesses” the in-between pixels. This leads to a sharper, and more natural looking, result. Figure 1 shows the difference in quality that can be achieved with an early version of Kino that has been trained on a small dataset. The resolution of these upscaled images will improve further as the Kino technology and datasets mature. Machine learning at this scale requires deep expertise in mathematics and algorithms as well as the ability to deploy and manage massive cloud and GPU computation. This is not the kind of program that can run on a user's laptop or desktop nor is it practical for all content owners to do this themselves. Kino meets this need by securely upscaling videos and transferring the content back to the owners.
Use cases: Re-release of classic movies and TV shows, historical content to be embedded in new productions, home and wedding videos from decades ago, updating content in the national archives etc.
Figure 1: Side by Side Comparison
image: img_0.jpg

Technology

Figure 2 illustrates the traditional upscaling approach such as bi-linear or Lanczos interpolation. The colors of the unknown pixels need to be filled in (or guessed) based on some knowledge of the surrounding pixels. Traditional methods assume that pixel values change smoothly and use a simple function such as a line to estimate the missing values. This works well in some areas like the white dress in Figure 1. In other areas, like the edges inside the image where colors change abruptly, the smooth interpolation method produces a blurry and unsatisfactory result. To overcome this deficiency, Kino's algorithm can “look” at the image and “guess” the most likely values needed for the in-between pixels. In Figure 2, this manifests as the two interpolated values between the second and third squares having intermediate colors not found in the original image.
Figure 2: Linear Interpolation
image: img_1.jpg
Figure 3: Missing pixels
image: img_2.jpg
Even if you have never seen the particular image in Figure 3, you can easily fill in the two missing parts. Why? The reason is because you have likely seen similar images before; and you can use your memory with the context from surrounding parts of the image to fill in the gaps with the most likely pixels. Kraenion uses a similar approach to develop Kino: use large quantities of high resolution modern content, downscale these to the resolutions expected for legacy content, and use this dataset to train neural networks and sparse coding algorithms to reconstruct the original high resolution version. These neural networks learn patterns in images and are then capable of guessing accurately to fill in the missing information. Such trained neural networks can then be used to upscale legacy content that was not part of the training datasets. Networks can be trained on a variety of content: sports videos to upscale sports content, nature pictures to upscale outdoor scenes in movies, etc.
Figure 4 shows how Kino's technology works. Rather than assuming a smooth mathematical function, Kino trains on video datasets and learns a large family of basis functions that can be used to represent images at different resolutions. While analyzing legacy video, Kino applies the learned set of small basis functions to guess the content of image patches and uses the corresponding large basis functions to re-construct the high resolution image. Kino's reconstruction looks across both space and time: by looking at patches across multiple frames, Kino provides the most satisfying reconstruction of moving objects.
There is a cost-benefit trade-off that needs to be considered for scaling large videos with a reasonable amount of cloud based computation. For example, consider an hour-long video created at 30 frames/second. An upscaling method that uses just 10 seconds per frame would require 300 hours of high-end GPU node time to upscale the entire hour. While this may be justifiable for super-hit movies, it is cost prohibitive for makers of home videos and TV shows. To address this issue, Kino offers some mathematical optimizations that recognize this trade-off and can upscale large videos within reasonable budgets.
Kino provides two upscaling methods: one based on linear algebra and sparse coding, and the other uses a convolutional neural network. Kino will select the optimal method based on the number of hours of legacy video that needs to be upscaled and the budget constraints. The quality of reconstructions is largely dependent on appropriateness of the training data set for the specific type of legacy content. Kino's initial focus is on classic movies, sports, and home videos. We are open to developing new domains that are of commercial interest.
Figure 4: Kino Basis Functions
image: img_3.jpg

Results:

To quantify the image quality improvement, we computed the mean absolute difference between upscaled videos and originals for those cases where we had access to high resolution originals. We also used the open source tool butteraugli from Google to quantify our improvement. Depending on the video, Kino has demonstrably less error compared to traditional upscaling. The table below shows our initial results on a test set of randomly sampled scenes. ML1 refers to the sparse coding method and ML2 refers to the neural network method.
Bilinear
ML1
ML2
Mean Absolute Error in RGB (lower is better)
0.71
0.59
0.47
Average Visual Discrepancy (butteraugli tool by Google, lower is better)
5.77
4.05
3.69

Individual Frames

Figures 5,6,7 show original and upscaled versions of a sample frame. Observe that both ML approaches have learned how to fill in detail naturally and render boundaries as crisply as the pixel count allows. To see how legitimate that claim is, see the zoomed up region of the actor's eye in Figure 8. A 100 by 50 pixel region (50 by 25 in the original frame) has been zoomed up to show individual pixels.
Figure 5: Original Frame
image: img_4.png
Figure 6: Kraenion ML1 Upscaled
image: img_5.png
Figure 7: Kraenion ML2 Upscaled
image: img_6.png
Figure 8: Eye Detail. Top: Original, Middle and Bottom: Kraenion Upscaled using ML1 and ML2
image: img_7.png
image: img_8.png
image: img_9.png

Videos

Finally, we leave you with two different 10 second video clips to compare the original and upscaled video quality.

Original:

iframe_start class="iframe-embed" src="https://www.youtube.com/embed/8qbEcaMq54w?Version=3&loop=1&playlist=8qbEcaMq54w" allowfullscreen iframe_end

ML1 (Kraenion)

iframe_start class="iframe-embed" src="https://www.youtube.com/embed/--QlsYXP_wE?Version=3&loop=1&playlist=--QlsYXP_wE" allowfullscreen iframe_end

ML2 (Kraenion)

iframe_start class="iframe-embed" src="https://www.youtube.com/embed/FUn9aUw5vAw?Version=3&loop=1&playlist=FUn9aUw5vAw" allowfullscreen iframe_end

Original:

iframe_start class="iframe-embed" src="https://www.youtube.com/embed/Bq552PxcsjE?Version=3&loop=1&playlist=Bq552PxcsjE" allowfullscreen iframe_end

ML1 (Kraenion)

iframe_start class="iframe-embed" src="https://www.youtube.com/embed/6BpD_TZaunY?Version=3&loop=1&playlist=6BpD_TZaunY" allowfullscreen iframe_end

ML2 (Kraenion)

iframe_start class="iframe-embed" src="https://www.youtube.com/embed/kJDAe7sp3CQ?Version=3&loop=1&playlist=kJDAe7sp3CQ" allowfullscreen iframe_end

Side-by-side Comparison of Bilinear and ML2 (Kraenion)

iframe_start class="iframe-embed" src="https://www.youtube.com/embed/9wrnLb6epFI?Version=3&loop=1&playlist=9wrnLb6epFI" allowfullscreen iframe_end
Disclaimer: The purpose of the clips is to demonstrate comparative quality and not to re-distribute content. The less than 10 second length of the videos is believed to be fair use. If you own this or similar content, we would love to work with you to improve your content, so let's be friends ok?
Patents pending