The technological process behind your run-of-the-mill scanning app might seem rather simple: activating the phone's camera and taking a photo of whatever is in front of it. For many scanning apps, this assumption proves true.
But things get a lot more complicated when the object being scanned is a physical photograph. How to differentiate the photo from its background? How to detect its edges so as to know where to cut off the non-photo part, leaving just the relevant visual content and discarding the rest? And what about multiple photographs appearing in one view? This calls for a whole different kind of computation process. One that can make independent calculations, reach conclusions, and take action accordingly -all in real-time.
THE CHALLENGES OF PHOTO SCANNING WITH A HANDHELD DEVICE
Using your phone to take a photo of a photo does transfer the physical into digital, but it has its drawbacks: you would need to manually crop each photo taken, and you can only do this one photo at a time. In addition, it is almost impossible to ensure that the captured photo is as straight as its physical original.
If all you need to digitize is one or two photographs, this manual process might be sufficient. But this method falls short -as most families with analog photo albums will tell you- if that number is in the 100s or even 1000s. To address the latter using a handheld device, a much more complex scanning process is required. Photomyne invested considerable time and research into developing such a software, and it involves Artificial Intelligence technology.
THE THINKING MACHINE
The term Artificial Intelligence (A.I.) gets thrown around quite a bit these days. It's a general term that can mean different things depending on the machine-run process in question.
Under the general A.I. umbrella there is, among other subsets, the deep learning network that "attempts to mimic the activity in layers of neurons in the neocortex [of the brain]," according to the MIT Technology Review.
Deep learning networks apply what is called artificial neural networks or ANN in their computational process. "A deep learning model is designed to continually analyze data with a logic structure similar to how a human would draw conclusions. To achieve this, deep learning uses a layered structure of algorithms called an artificial neural network" (Brett Grossfeld, the Zendesk Blog).
ANN is essentially a software that can 'learn' to perform tasks by surveying examples from provided data.
In the context of scanning photographs, Photomyne has trained several deep learning networks to facilitate and address the challenges of photo detection and cropping mentioned in the beginning of this article. Here is a breakdown of the different algorithm components that make up the unique, super-smart scanning process in the Photomyne app.
1. THE FINE LINE BETWEEN PHOTOGRAPH AND BACKGROUND
How do you make a software successfully and consistently differentiate between a photograph and the area surrounding it? An officially-awarded Photomyne patent, the first part of the scanning process in the app is based on a deep learning network that taught itself to detect an image, 'understand' where its boundaries are, and where the background begins (ex. the table surface or wall behind it).
How did we do it? We helped a network train itself to recognize complex patterns of photo detection. We fed it with data of tens of thousands of real scanned photos with clearly (manually) defined edges. Following a training process of several weeks, the network started to pick up the 'right' or 'accurate' way to detect a photo's edge.
A couple of years and hundreds of thousands of photos later, this deep learning network component of the app's scanner is 92% accurate in deciding for itself what each scanned photo's edges are, and cropping it accordingly.
2. PHOTO PERSPECTIVE
The vast majority of people using their phone to scan photos hold their device at an angle (even if they do their best to hold it as parallel as possible to the photographs). This produces a rough, trapezoid input to work with. To tackle this challenge we developed a perspective-correction mechanism that is embedded in the scanning process.
After defining each photo's boundaries, the scanning algorithm proceeds to translate each photo's form into a trapeze shape that goes through a perspective correction - a necessary step for producing the end result of straight, rectangular shaped photos.
In other words, the scanning algorithm knows how to straighten the photos while maintaining the right proportions, regardless of their initial tilt.
3. SMART AUTOMATIC PHOTO ROTATION
Following the perspective correction of the visual input, the app's scanning algorithm moves on to identify which photos, if any, need to be rotated.
The ANN we trained has trained itself to know what is 'up' and 'down', that skies are on top and ground is bottom. If it identifies a photo that needs rotation, it does so automatically, in increments of 90 degrees.
This sophisticated technological capability is also the result of training a deep learning network by feeding it with real data -in this case, photos that were manually rotated to represent the 'correct' orientation of photos.
4. COLOR RESTORATION OF FADED COLORS (IN THE WORKS...)
This fourth component of Photomyne's scanning algorithm is still in the oven, but it's an especially exciting one. Old analog photographs often lose their color vividness over the years, adding a brown or yellowish tint that has come to represent the vintage look and feel (no wonder Instagram uses it as one of its retro filters).
We are currently working to train yet another ANN to identify such photos and to automatically correct faded colors as organically as possible.
These four critical components make the scanner by Photomyne the sophisticated, effective and yes, thinking mechanism that it is.
Thanks to these capabilities, digitizing analog photographs -whatever the quantity- is rendered a seamless process using nothing more than one's phone.