Image scaling#Deep convolutional neural networks

File:2xsai example.svg scaling (right)]]

In computer graphics and digital imaging, image scaling refers to the resizing of a digital image. In video technology, the magnification of digital material is known as upscaling or resolution enhancement.

When scaling a vector graphic image, the graphic primitives that make up the image can be scaled using geometric transformations with no loss of image quality. When scaling a raster graphics image, a new image with a higher or lower number of pixels must be generated. In the case of decreasing the pixel number (scaling down), this usually results in a visible quality loss. From the standpoint of digital signal processing, the scaling of raster graphics is a two-dimensional example of sample-rate conversion, the conversion of a discrete signal from a sampling rate (in this case, the local sampling rate) to another.

Mathematical

Image scaling can be interpreted as a form of image resampling or image reconstruction from the view of the Nyquist sampling theorem. According to the theorem, downsampling to a smaller image from a higher-resolution original can only be carried out after applying a suitable 2D anti-aliasing filter to prevent aliasing artifacts. The image is reduced to the information that can be carried by the smaller image.

In the case of up sampling, a reconstruction filter takes the place of the anti-aliasing filter.

File:160 by 160 thumbnail of 'Green Sea Shell'.png | Original 160x160px image

File:160 by 160 thumbnail of 'Green Sea Shell' - 0. in fourier domain.png | Original image in spatial-frequency domain

File:160 by 160 thumbnail of 'Green Sea Shell' - 1. fourier filtered for downsampling to 40 x 40.png | 2D low-pass filtered, but still at 160x160px

File:160 by 160 thumbnail of 'Green Sea Shell' - 1.1. fourier filtered image for downsampling to 40 x 40 in fourier domain.png | Filtered image in spatial-frequency domain

File:160 by 160 thumbnail of 'Green Sea Shell' - 2. downsampling to 40 x 40 (nearest neighour).png | low-pass filtered 160x160px image 4× downsampled to 40x40px{{nbsp}}|alt=low-pass filtered 160x160px image 4× downsampled to 40x40px

File:160 by 160 thumbnail of 'Green Sea Shell' - 3. fourier reconstruction from 40 x 40.png | 4× Fourier upsampling of 40x40px downsampled image to 160x160px (correct reconstruction)

File:160 by 160 thumbnail of 'Green Sea Shell' - 4. fourier reconstruction from 40 x 40 (aliasing ).png | 4× Fourier upsampling of 40x40px downsampled image to 160x160px (with aliasing)

A more sophisticated approach to upscaling treats the problem as an inverse problem, solving the question of generating a plausible image that, when scaled down, would look like the input image. A variety of techniques have been applied for this, including optimization techniques with regularization terms and the use of machine learning from examples.

Algorithms

An image size can be changed in several ways.

= Nearest-neighbor interpolation =

One of the simpler ways of increasing image size is nearest-neighbor interpolation, replacing every pixel with the nearest pixel in the output; for upscaling, this means multiple pixels of the same color will be present. This can preserve sharp details but also introduce jaggedness in previously smooth images. 'Nearest' in nearest-neighbor does not have to be the mathematical nearest. One common implementation is to always round toward zero. Rounding this way produces fewer artifacts and is faster to calculate.{{citation needed|date=December 2021}}

This algorithm is often preferred for images which have little to no smooth edges. A common application of this can be found in pixel art.

= Bilinear and bicubic algorithms =

Bilinear interpolation works by interpolating pixel color values, introducing a continuous transition into the output even where the original material has discrete transitions. Although this is desirable for continuous-tone images, this algorithm reduces contrast (sharp edges) in a way that may be undesirable for line art. Bicubic interpolation yields substantially better results, with an increase in computational cost.{{citation needed|date=December 2021}}

= Sinc and Lanczos resampling =

Sinc resampling, in theory, provides the best possible reconstruction for a perfectly bandlimited signal. In practice, the assumptions behind sinc resampling are not completely met by real-world digital images. Lanczos resampling, an approximation to the sinc method, yields better results. Bicubic interpolation can be regarded as a computationally efficient approximation to Lanczos resampling.{{citation needed|date=December 2021}}

= Box sampling =

One weakness of bilinear, bicubic, and related algorithms is that they sample a specific number of pixels. When downscaling below a certain threshold, such as more than twice for all bi-sampling algorithms, the algorithms will sample non-adjacent pixels, which results in both losing data and rough results.{{citation needed|date=December 2021}}

The trivial solution to this issue is box sampling, which is to consider the target pixel a box on the original image and sample all pixels inside the box. This ensures that all input pixels contribute to the output. The major weakness of this algorithm is that it is hard to optimize.{{citation needed|date=December 2021}}

= Mipmap =

Another solution to the downscale problem of bi-sampling scaling is mipmaps. A mipmap is a prescaled set of downscaled copies. When downscaling, the nearest larger mipmap is used as the origin to ensure no scaling below the useful threshold of bilinear scaling. This algorithm is fast and easy to optimize. It is standard in many frameworks, such as OpenGL. The cost is using more image memory, exactly one-third more in the standard implementation.

= Fourier-transform methods =

Simple interpolation based on the Fourier transform pads the frequency domain with zero components (a smooth window-based approach would reduce the ringing). Besides the good conservation (or recovery) of details, notable are the ringing and the circular bleeding of content from the left border to the right border (and the other way around).

= Edge-directed interpolation =

Edge-directed interpolation algorithms aim to preserve edges in the image after scaling, unlike other algorithms, which can introduce staircase artifacts.

Examples of algorithms for this task include New Edge-Directed Interpolation (NEDI),{{cite web|url=http://chiranjivi.tripod.com/EDITut.html|title=Edge-Directed Interpolation|access-date=19 February 2016}}{{cite journal|author1=Xin Li|author2=Michael T. Orchard|journal=2000 IEEE International Conference on Image Processing|page=311|url=http://www.csee.wvu.edu/~xinl/papers/ICIP2000a.pdf|title=NEW EDGE DIRECTED INTERPOLATION|url-status=dead|archive-url=https://web.archive.org/web/20160214121631/http://www.csee.wvu.edu/~xinl/papers/ICIP2000a.pdf|archive-date=2016-02-14}} Edge-Guided Image Interpolation (EGGI),{{cite journal |author1=Zhang, D. |author2=Xiaolin Wu |title=An Edge-Guided Image Interpolation Algorithm via Directional Filtering and Data Fusion|journal=IEEE Transactions on Image Processing |volume=15 |issue=8 |pages=2226–38 |pmid=16900678 |bibcode=2006ITIP...15.2226Z |year=2006 |doi=10.1109/TIP.2006.877407 |s2cid=9760560 }} Iterative Curvature-Based Interpolation (ICBI),{{cite web|title=ImagezO Image Resizer|url=https://imagezo.com/image-resizer/|access-date=28 January 2025}} and Directional Cubic Convolution Interpolation (DCCI).{{cite web|author1=Dengwen Zhou|author2=Xiaoliu Shen|title=Image Zooming Using Directional Cubic Convolution Interpolation|url=http://www.mathworks.com/matlabcentral/fileexchange/38570-image-zooming-using-directional-cubic-convolution-interpolation|access-date=13 September 2015}} A 2013 analysis found that DCCI had the best scores in peak signal-to-noise ratio and structural similarity on a series of test images.{{cite arXiv|author1=Shaode Yu|author2=Rongmao Li|author3=Rui Zhang|author4=Mou An|author5=Shibin Wu|author6=Yaoqin Xie|title=Performance evaluation of edge-directed interpolation methods for noise-free images|eprint=1303.6455|class=cs.CV|year=2013}}

= hqx =

For magnifying computer graphics with low resolution and/or few colors (usually from 2 to 256 colors), better results can be achieved by hqx or other pixel-art scaling algorithms. These produce sharp edges and maintain a high level of detail.

= Vectorization =

Vector extraction, or vectorization, offers another approach. Vectorization first creates a resolution-independent vector representation of the graphic to be scaled. Then the resolution-independent version is rendered as a raster image at the desired resolution. This technique is used by Adobe Illustrator, Live Trace, and Inkscape.{{cite journal|author=Johannes Kopf and Dani Lischinski|title=Depixelizing Pixel Art|journal=ACM Transactions on Graphics |year=2011|volume=30 |issue=4|pages=99:1–99:8|doi=10.1145/2010324.1964994|archive-url=https://archive.today/20150901034643/http://research.microsoft.com/en-us/um/people/kopf/pixelart| archive-date=2015-09-01 |url=http://research.microsoft.com/en-us/um/people/kopf/pixelart/|access-date=24 October 2012}} Scalable Vector Graphics are well suited to simple geometric images, while photographs do not fare well with vectorization due to their complexity.

= Deep convolutional neural networks =

This method uses machine learning for more detailed images, such as photographs and complex artwork. Programs that use this method include waifu2x, Imglarger and Neural Enhance.

Demonstration of conventional vs. waifu2x upscaling with noise reduction, using a detail of Phosphorus and Hesperus by Evelyn De Morgan. [Click image for full size]

File:Evelyn de Morgan - Phosphorus and Hesperus, (1881) detail.png|Original image

File:Evelyn de Morgan - Phosphorus and Hesperus, (1881) detail - upscaled 200% using Paint Shop Pro.png|Image upscaled 200% using PaintShop Pro

File:Evelyn de Morgan - Phosphorus and Hesperus, (1881) detail - upscaled 200% using Waifu2x in Photo mode with Medium noise reduction.png|Image upscaled 200% using waifu2x in Photo mode with Medium noise reduction

File:Evelyn de Morgan - Phosphorus and Hesperus, (1881) detail output.png|Image upscaled 400% using Topaz A.I. Gigapixel with Low noise reduction

File:Evelyn de Morgan - Phosphorus and Hesperus, (1881) RealSR.png|Image upscaled 400% using RealSR DF2K-JPEG

AI-driven software such as the MyHeritage Photo Enhancer allows detail and sharpness to be added to historical photographs, where it is not present in the original.

File:John Howard Lindauer portrait.jpg|Low-quality 270×368 pixel photograph of John Howard Lindauer, scanned from a State Legislature directory

File:John Howard Lindauer 1983 (upscaled whitebackground enhanced).jpg|The same image enhanced and upscaled to 2160×2944 pixels by MyHeritage's Photo Enhancer

Applications

= General =

Image scaling is used in, among other applications, web browsers,[http://www.entropymine.com/resamplescope/notes/browsers/ Analysis of image scaling algorithms used by popular web browsers] image editors, image and file viewers, software magnifiers, digital zoom, the process of generating thumbnail images, and when outputting images through screens or printers.

= Video =

This application is the magnification of images for home theaters for HDTV-ready output devices from PAL-Resolution content, for example, from a DVD player. Upscaling is performed in real time, and the output signal is not saved.

= Pixel-art scaling =

As pixel-art graphics are usually low-resolution, they rely on careful placement of individual pixels, often with a limited palette of colors. This results in graphics that rely on stylized visual cues to define complex shapes with little resolution, down to individual pixels. This makes scaling pixel art a particularly difficult problem.

Specialized algorithms{{cite web|url=http://www.datagenetics.com/blog/december32013/index.html|title=Pixel Scalers|access-date=19 February 2016}} were developed to handle pixel-art graphics, as the traditional scaling algorithms do not take perceptual cues into account.

Since a typical application is to improve the appearance of fourth-generation and earlier video games on arcade and console emulators, many are designed to run in real time for small input images at 60 frames per second.

On fast hardware, these algorithms are suitable for gaming and other real-time image processing. These algorithms provide sharp, crisp graphics, while minimizing blur. Scaling art algorithms have been implemented in a wide range of emulators such as HqMAME and DOSBox, as well as 2D game engines and game engine recreations such as ScummVM. They gained recognition with gamers, for whom these technologies encouraged a revival of 1980s and 1990s gaming experiences.{{Citation needed|date=December 2015}}

Such filters are currently used in commercial emulators on Xbox Live, Virtual Console, and PSN to allow classic low-resolution games to be more visually appealing on modern HD displays. Recently released games that incorporate these filters include Sonic's Ultimate Genesis Collection, Castlevania: The Dracula X Chronicles, Castlevania: Symphony of the Night, and Akumajō Dracula X Chi no Rondo.

= Real-time scaling =

A number of companies have developed techniques to upscale video frames in real-time, such as when they are drawn on screen in a video game. Nvidia's deep learning super sampling (DLSS) uses deep learning to upsample lower-resolution images to a higher resolution for display on higher-resolution computer monitors.{{Cite web|title=NVIDIA DLSS: Your Questions, Answered|url=https://www.nvidia.com/en-us/geforce/news/nvidia-dlss-your-questions-answered/|access-date=2021-10-13|website=www.nvidia.com|language=en-us|archive-date=5 October 2021|archive-url=https://web.archive.org/web/20211005045718/https://www.nvidia.com/en-us/geforce/news/nvidia-dlss-your-questions-answered/|url-status=live}} AMD's FidelityFX Super Resolution 1.0 (FSR) does not employ machine learning, instead using traditional hand-written algorithms to achieve spatial upscaling on traditional shading units. FSR 2.0 utilises temporal upscaling, again with a hand-tuned algorithm. FSR standardized presets are not enforced, and some titles such as Dota 2 offer resolution sliders.{{Cite web|title=Valve's Dota 2 Adds AMD FidelityFX Super Resolution - Phoronix|url=https://www.phoronix.com/scan.php?page=news_item&px=Dota-2-FidelityFX-Super-Res|access-date=2021-10-13|website=www.phoronix.com|archive-date=21 July 2021|archive-url=https://web.archive.org/web/20210721181636/https://www.phoronix.com/scan.php?page=news_item&px=Dota-2-FidelityFX-Super-Res|url-status=live}} Other technologies include Intel XeSS and Nvidia Image Scaler (NIS).{{Cite web|last=Gartenberg|first=Chaim|date=2021-08-19|title=Intel shows off its answer to Nvidia's DLSS, coming to Arc GPUs in 2022|url=https://www.theverge.com/2021/8/19/22631061/intel-arc-gpu-alchemist-xe-ss-super-sampling-ai-architecture-day-preview|access-date=2021-10-13|website=The Verge|language=en|archive-date=19 August 2021|archive-url=https://web.archive.org/web/20210819171113/https://www.theverge.com/2021/8/19/22631061/intel-arc-gpu-alchemist-xe-ss-super-sampling-ai-architecture-day-preview|url-status=live}}{{Cite web|date=2021-11-16|title=What Is Nvidia Image Scaling? Upscaling Tech, Explained|url=https://www.digitaltrends.com/computing/what-is-nvidia-image-scaling/|access-date=2021-12-03|website=Digital Trends|language=en}}

References

Category:Image processing

Category:Articles containing video clips