December 17, 2019
All about the new ML Super Resolution feature in Pixelmator Pro
It’s no secret that we’re pretty big fans of machine learning and we love thinking of new and exciting ways to use it in Pixelmator Pro. Our latest ML-powered feature is called ML Super Resolution, released in today’s update, and it makes it possible to increase the resolution of images while keeping them stunningly sharp and detailed. Yes, zooming and enhancing images like they do in all those cheesy police dramas is now a reality!
Let’s see some examples
Before we get into the nitty-gritty technical stuff, let’s get right to the point and take a look at some examples of what ML Super Resolution can do. Until now, if you had opened up the Image menu and chosen Image Size, you would’ve found three image scaling algorithms — Bilinear, Lanczos (lan-tsosh, for anyone curious), and Nearest Neighbor, so we’ll compare our new algorithm to those three.
Note that the images below are zoomed in to 200% to make the changes easier to see, but you can zoom out to 100% by clicking the magnifying glass button.
Pretty incredible, right? Until now, if an image was too small to be used at its original resolution, either on the web or in print, there was no way to scale it up without introducing visible image defects like pixelation, blurriness, or ringing artifacts. Now, with ML Super Resolution, scaling up an image to three times its original resolution is no problem at all.
How does it all work?
As computers get ever more powerful, the additional power opens up new possibilities. One of the uses of machine learning, on a very fundamental level, is to make predictions about things. In this case, we gathered a set of images, scaled them down, and then ‘taught’ the algorithm to go from the scaled-down version to the original resolution, high-quality image, predicting the values of each new pixel. The algorithm can’t recreate detail that is too small to be visible but it can make amazing predictions about edges, shapes, contours, and patterns that traditional algorithms simply cannot.
Traditional approaches use (relatively) simple mathematics to interpolate the values of pixels when scaling images.
When adding new pixels, the most basic algorithm, Nearest Neighbor, simply takes the color of the closest neighboring pixel. This results in the classic blocky appearance because the previously imperceptibly small pixels are now big enough to be seen.
The Bilinear algorithm is a little more advanced. A texture map of the image is created according to an algorithm and the values of the 4 closest texels (texture elements) are used when recreating each pixel in the new image. The goal of this approach is to make the transition between pixels much smoother. However, when upscaling quite significantly (or upscaling small images) this algorithm creates the familiar blurry appearance.
Lanczos is yet more advanced, using a complicated mathematical formula to interpolate (another word for predict) the value of any newly created pixels while keeping edges as sharp as possible. Its main disadvantage is that, in its attempts to retain sharpness, the algorithm can sometimes create ringing artifacts. So, ultimately, it’s useful in certain specialized situations, but not much more.
The machine learning way
So, how does the machine learning approach work? Put simply, it takes into account the actual content of every image, attempting to recognize edges, patterns, and textures, recreating detail based on our dataset and extensive training. When upscaling, it can make much better predictions because a red pixel next to a blue pixel can be a completely different type of texture or edge in different images even though, to the primitive approaches, they’re always the same.
The ML Super Resolution convolutional neural network
To create the ML Super Resolution feature, we used a convolutional neural network. This type of deep neural network reduces raster images and their complex inter-pixel dependencies into a form that is easier to process (i.e. requires less computation) without losing important features (edges, patterns, colors, textures, gradients, and so on). The ML Super Resolution network includes 29 convolutional layers which scan the image and create an over-100-channel-deep version of it that contains a range of identified features. This is then upscaled, post-processed and turned back into a raster image. Below is a simplified representation of the neural network.
First, the input image is passed through a high pass filter for basic edge detection. Then, the first convolutional layer reduces the size of these features and pools the data. In the Descriptor Fusion block, the image is scanned to find any JPEG compression blocks within it and this is fused with the other features identified so far.
The next convolutional layers and residual blocks are where the magic happens — these detect the features (edges, patterns, colors, textures, gradients, and so on) in the image, building them up into a complex representation that is over 100 channels deep. In a convolutional neural network, more layers mean better accuracy but with a large enough number of layers, a network becomes near-impossible to train. Residual blocks are designed to increase the complexity and accuracy of networks without making them impossible to train.
Finally, all the features identified by the neural network are enlarged in the Enlarge block. After this, the two residual blocks and the final convolutional layer post-process the data and turn the features back into an image. It’s also important to note that all this happens on-device and the entire trained machine learning model is included inside the Pixelmator Pro app package.
Dealing with noise and artifacts
Small images often contain compression artifacts and noise. If we want our upscaled images to be usable, artifacts and noise shouldn’t be scaled up together with the actual contents of the image. In fact, if possible, they should be removed altogether. And, as mentioned above, ML Super Resolution is designed to do just that, borrowing some of the technologies we developed for ML Denoise to remove both camera noise and JPEG compression artifacts. By the way, in this update, ML Denoise has also been improved, bringing noise removal that is between 2 to 4 times better than before.
ML Super Resolution
Processing power required
Naturally, the machine learning way requires a lot more processing power than the primitive approaches — between 8 to 62 thousand times more, in fact.
* When upscaling 1 pixel by 300%, creating 9 pixels.
Making this available in an app like Pixelmator Pro has only become possible in the last couple of years — even on Mac computers from 5 or so years ago, ML Super Resolution can take minutes to process a single image due to slower performance and less available memory. On the latest hardware, however, images are processing in a few seconds, and even faster on iMac Pro, Mac Pro, or any Mac with multiple GPUs thanks to our use of Core ML 3 and its multi-GPU support. For the same reasons, the performance of ML Super Resolution is also significantly improved when using an eGPU.
1. For this test, a 300,000 pixel image was upscaled to three times its original size.
2. Tested using an AMD Radeon RX 5700 XT eGPU.
3. External GPUs require a Thunderbolt 3-equipped Mac.
Using the 2012 MacBook Pro as a baseline, the latest devices are up to 200x faster!
We’re incredibly excited about ML Super Resolution and we honestly hope you’re going to love it too. If you’d like to, you can download all the images in this blog post using the link below and test everything in today’s update out for yourself.
Pixelmator Pro 1.5.4 is now available from the Mac App Store, so head on down there and make sure you’re up to date. The trial version has also been updated so if you don’t yet have a copy, you’re welcome to try it out. That’s it for now, but we hope to surprise you with one more cool new feature before the year is up — stay tuned!