How To: Upscaling Lower Resolution Video to 4K Updated

Overview of 4K Upscaling

So you just bought an expensive new 4K UHD television set. Your issue is that most live and recorded video distributed today is less than 4K (3840×2160) resolution. Here in the U.S., Blu-rays are 1080p format while DVDs are only 480i/p. TV broadcasters use either 1080i or 720p for HD stations and 480i for SD. Some cable and satellite companies are downscaling 1080i to 720p format. Although some over-the-top video service providers like Amazon and Netflix have select content in 4K, most of their content is in 1920×1080 resolution. Even 4K content isn’t always true 4K: frequently, the video was upsampled during post-production from a lower resolution master. To solve this, UHD TVs include upscalers for lower resolution content originating from cable/satellite settop boxes, Blu-ray and DVD players and gaming consoles.

New settop boxes from Nvidia, Apple, and Amazon support 4K resolution. However, they output all video and graphics at a fixed resolution. So, if you want to watch any 4K content (videos or photos) or use a 4K graphical interface, then you must set your settop box to 2160p. The Shield TV or Apple TV–not your UHD TV–will then upscale all lower resolution video. Unfortunately, this results in relatively soft video quality due to the settop box’s inferior upscaling as compared to UHD TVs.1

Comparison of video resolutions. Credit Wikipedia

Real-time video upscaling is different from image upscaling because it requires conversion in milliseconds.2 Additionally, video frames tend to suffer from motion blur, noise, artifacts such as macro-blocking, chroma compression, and interlacing. On the other hand, in video there are different adjacent images (frames) that can be used as sources to reconstruct a given frame.

Simulation of 4k v 2k: 4x the number of pixels. Credit HBO.

Potential Solution #1: Native Resolution Playback

For an application developer, the simplest means of improving video quality would be to leverage the UHD TV’s built-in upscalers for content that is less than 3840×2160. When video playback commences, the settop OS would output at 1080p/i, 720p, or 480p format. Upon exiting playback, the OS would return to 4K resolution.

Unfortunately for the Apple TV, there is no means with TvOS 11 for an application developer to play a lower resolution video at its original resolution. This apparently is by design. Apple doesn’t want users to wait for their TVs to adjust to a resolution switch. Also, all onscreen controls would need to be rendered at the target playback resolution (e.g. 480i, 720p, or 1080i/p). And, there are other considerations such as picture-in-picture use cases.

For the Shield TV with Shield Experience 6.2, it is now possible for application developers to specify a lower resolution.3 However, it is not possible for an application developer to change the colorspace from Rec. 2020 to Rec. 709 on the fly. Since Rec. 2020 is only for 4K and 8K resolutions, users must set their settop to Rec. 709 for all videos. The problem with Rec.709 is two-fold for native 4K content: first, users cannot watch HDR content (high dynamic range or more grayscale). Second, users lose wide color gamut (richer, more saturated colors).

Potential Solution #2: a Better OS-level or Application-Level Upscaler

Alternatively, an OS-level or application-level upscaler could be written that improves the quality of video. This avoids the issues with forcing the TV to switch resolutions, with mixing lower resolution video and graphics, and with Rec.2020 versus Rec.709.

First, a detour to discuss the difference between luma and chroma, chroma subsampling, and its impact on upscaling. Luminance is the black and white information in an image. Chromiance is the color information in an image. Because humans are less sensitive to color differences than luminance, chroma subsampling is used during content distribution to save bandwidth by compressing the chroma information more than the luma information. Blu-rays, DVDs and digital TV broadcasts use YUV420. 4:4:4 is a full bandwidth signal, with full (Cb, Cr) vertical and horizontal resolution. 4:2:0 needs ½ the bandwidth, with ½ horizontal resolution and ½ vertical resolution for the chroma component.

Three conclusions should be drawn from this:

  1. When we refer to the resolution of a video encoded in YUV420, we are referring to the resolution of the luma channel. The chroma channel of a 1920×1080 resolution Blu-Ray must be upscaled from 960×540.
  2. Due to limitations of visual perception, if there’s a limitation of computing resources then priority should be given to upscaling the luma information first.
  3. All of the above assumes that the STB, the HDMI cable, and the UHD TV are capable of 4:4:4 and configured for that. If there’s a bandwidth limit or other constraint, 10 bit color depth and HDR would be preferable over 4:4:4.

Image Upscaling

For media playback on my Nvidia Shield TV settop boxes, I prefer the Android variant of Kodi called SPMC. Its best configuration today is to disable surface rendering and to use the SPMC HQ scaler “lanzcos optimized”. Other supported methods include basic linear interpolators like nearest-neighbor, bilinear, and bicubic. While these upscalers are low complexity and can run on even entry-level settop boxes, they don’t adapt to the video content. Therefore, they can have aliasing artifacts and over-smoothed regions.45 Lanzcos optimized upscaling definitely improves the sharpness of video content but it is still a relatively primitive algorithm. Moreover, the implementation in SPMC suffers from some judder even on the Nvidia Shield TV.

Here, Kodi/SPMC lags behind MPV, another open source media player, in terms of video filter support. For starters, with MPV, you can specify ewa_lanczossharp, which has better video quality than lanzcos optimized.67

Image Doubling & Tripling

But MPV also supports powerful image doubling/tripling upscalers, known as “prescalers”. Frames can be doubled, tripled or–through a second or third pass for DVD and SD broadcast content–sextupled or octupled. 89

One of the most advanced prescalers is NNEDI3, which was originally designed as an intrafield deinterlacer. But it is also useful for enlarging images by powers of two. Compared to earlier NNEDI versions, v3 has an improved predictor neural network architecture and local neighborhood pre-processing. NNEDI3’s neural network consists of 16 to 256 neurons. 32 neurons generally has the best quality for performance. After that, there are diminishing returns of additional neurons, particularly if you are quadrupling the size of an image. NNEDI3 with higher neurons is prohibitively computationally intensive using OpenGL in embedded implementations. Therefore, it is best used for luma image doubling.10

On the Horizon: Vulkan and RAISR/RAVU

To achieve better upscaling for a powerful settop box like the Nvidia Shield TV, development should consider two paths:

  1. Utilize a more efficient graphics subsystem than OpenGL. For Android settop boxes, this means Vulkan; and
  2. Employ a user shader that produces better results than ewa_lanczossharp, that is more efficient than NNEDI3, and, if possible, that can be used for more than just a single pass of luma channel doubling.

Vulkan

Vulkan is a cross-platform 3D graphics and compute API. Compared to its predecessor, OpenGL, Vulkan has lower overhead and therefore better performance. OpenGL developers write shaders in a high-level language, GLSL. At runtime, these are translated into the GPU’s machine code. By contrast, Vulkan shaders use an intermediate format, SPIR-V. Because Vulkan has better performance, more shaders can be used per scene.

MPV recently implemented a Vulkan subsystem. Vulkan’s performance gains over OpenGL are noteworthy.11

RAISR

“RAISR: Rapid and Accurate Image Super-Resolution”, by Google, is one of the most efficient of the advanced image super resolution algorithms, including sparsity-based and neural network-based methods. RAISR works by enhancing the quality of ‘cheap’ (e.g. bilinear) interpolation by applying separable difference of Gaussians filters on the image for edge enhancement. Artifacts such as halos and noise amplification are reduced using a Census Transform blender. A simple least squares solver is used to learn the filters. RAISR upscaling video quality compares favorably to state of the art exampled-based algorithms like A+ and SRCNN but is much faster. Moreover, RAISR results may improve in the future with an improved hashing function and by regularizing the learning from efficient priors.

Google claims that RAISR produces results 10x to 100x faster than comparable methods. Google announced in January 2017 that it was serving over 1 billion images a week using RAISR. In Google’s case, they are deliberately down-sampling images prior to transfer, then reconstructing a higher resolution image on the client. Google hasn’t announced RAISR‘s application to real-time 4K video upscaling.

Google’s RAISR algorithm reduces bandwidth for photos by 75%. Credit: Google

 

RAVU

However, the MPV project has created RAVU prescalers, which are based on RAISR and available here. RAVU adopts the core idea of RAISR for upscaling, But it doesn’t currently include RAISR‘s post-processing, including blending or sharpening. RAVU is a convolution kernel-based upscaling algorithm that is trained from large amount of pictures with a linear regression model. RAVU is similar to EWA scalers but it is more adaptive to local gradient features. RAVU doesn’t produce as sharp an upscaled image as NNEDI3 and Super-XBR. But, RAVU shaders are 5x faster than NNEDI3.12 And, it produces fewer artifacts like aliasing. Currently, RAVU is tuned for cartoons and anime rather than live action photos. Using RAVU, you can upscale both luma and chroma planes or, if performance is an issue, just the luma plane.1314 15 RAVU user shaders are available for Vulkan here.

For prototyping purposes, I describe here how to configure MPV for 4K upscaling on a Retina 5K iMac. This example uses OpenGL, not Vulkan because Vulkan is not directly supported on the Mac platform. This configuration will automatically load the appropriate prescaler and image scaling settings based on the resolution of the source video. For example, 1080 resolution video will be image doubled using RAVU. 720 resolution video will be tripled by RAVU (720 -> 2160). Finally, NTSC 480 resolution video will sextupled by RAVU (480 -> 2880); then downscaled (2880 -> 2160).1617

Other Artifact Removal

Reducing Banding Artifacts

If you are going to optimize a video for 4K resolution, it also makes sense to take advantage of Rec. 2020’s expanded color range and deband color gradiants. For example, even on Blu-Ray video, you may have seen color bands in the the sky or dark scenes. These are due to a more limited color space. Rec. 709 defines 8 bits or 10 bits per sample in each color channel whereas Rec. 2020 is 12 bits. So, it is now possible to smooth the differences in color when rendering videos on UHD TVs. In MPV, debanding is enabled by default. Optionally, it can be complemented by adding grain to combat the artificial flat look caused by upscaling.

Reducing Ringing Artifacts

Oversharpening an image can cause a ringing effect around edges. Some methods have a ‘repair’ function that minimizes this ringing. In MPV, you can also configure the precise amount of anti-ringing that you want. However, with less ringing, there will also be less detail in the frame.

Further Out: Multi-Frame Video Super-Resolution

Thusfar, we have covered single image super-resolution methods. Multi-frame video super-resolution permits the upscaling of a video using image information from adjacent frames. Previously, multi-frame super resolution has focused on motion estimation and noise cancellation. In the future, single frame CNN may be valuable for MFSR. Results are promising but currently inconsistent across scenes.18



Updated on January 21st, 2018


  1. Ironically, the latest settop boxes have GPUs that are more capable than UHD TVs’, with longer lifetimes for software updates.

  2. A millisecond is a thousandth of a second. For a 60fps video, that means a total permissible video pipeline of 16.6ms (1000/60) per frame. For a 24fps video like a movie, that means a total permissible video pipeline of 41.6ms (1000/24) per frame. The longer processing time that upscaling takes, the less time is available for other processing.

  3. SPMC 17.6 just introduced this experimental capability.

  4. Windows PC users can combine Kodi with MadVR, a DirectShow video renderer, through DSPlayer. But, MadVR doesn’t run on Android or iOS settop boxes. Because the MadVR solves similar video quality issues, it is worth reading about its options here. However, my goal is to play 4K content throughout a home using multiple low-profile settop boxes, not high-end Windows PCs. Moreover, MadVR is not open source software.

  5. Kodi 17 for Android doesn’t support HQ scalers but SPMC 17 does.

  6. ewa_lanczossharp is a tweaked version of ewa_lanczos with optimal values. It is slightly harder on the GPU than ewa_lanczos.

  7. Lanczos uses Sinc and EWA Lanczos uses Jinc as the weighting function and the windowing function for the resampling kernel.

  8. Image doubling alone works for 1080>2160 video upscaling. Image tripling works for 720>2160 upscaling in one pass. For video in 480 resolution, it can be sextupled in two passes, then downscaled to 2160.

  9. Chroma image doubling requires significant performance but has less impact in terms of visual perception.

  10. Other image doubling algorithms include Super-XBR and MadVR’s NGU. See also Pixel-art Scaling Algorithms, which focuses on algorithms to upscale vintage video games and anime/cartoons.

  11. The second design of the Vulkan subsystem is discussed here including new async transfers.

  12. RAVU with radius=3 versus NNEDI3 with neurons=32. See rendering times to prescale luma plane of a full HD video (1920×1080 in YUV format) to 4K (3840×2160) in Performance

  13. The basics of how RAVU works are summarized here: About RAVU, here: High-level description of RAVU algorithm, Implement RAVU as renderscript, and here: Possible RAVU improvement ideas

  14. See comparison images here

  15. Source: here. Higher “r” means higher radius (better quality). RAVU-Lite is a faster, slightly-lower-quality and luma-only variant of RAVU

  16. In MPV, chroma is upscaled to luma resolution (video size) first. Then, the converted RGB is upscaled to the target resolution (screen size). NNEDI3 and RAVU then double that image. You can recursively call RAVU to quadruple or octuple an image.

  17. The video tripling capability is brand new and requires a compute shader that doesn’t work properly on my Mac.

  18. See Multi-Frame Video Super-Resolution Using Convolutional Neural Networks