In the world of digital image manipulation, depth maps are a very useful tool to create effects that allow the image to be given more three-dimensionality. While the human eye can understand depth in flat images, the machine still has trouble understanding the position of the elements in the images. A new artificial intelligence-based image analysis model seeks to change that:
As can be seen in the video, developers and researchers from Simon Fraser University in Canada created a new computer vision model capable of seeing depth in the image. As PetaPixel reports, using machine learning, the developers created a system that must observe the image, establish contextual references and from there determine sizes of the elements in the scene to create the depth map.
This process uses a logical approach to how humans understand the depth of space in a flat image. We are aware of the approximate size of things and under the principles of perspective we could think if something is further away because it looks small or closer because it looks bigger than normal. Likewise, we know that something is in front of or behind another object because we understand the relationship of the bodies in the image.
The team started from a model that used high resolution images, but it had inconsistent results. They found that the model could reproduce detail in high resolution images, but lost overall depth. Meanwhile, in low resolution images there was no detail but there was more understanding of depth.
This is because in a high resolution image the image analysis algorithms can differentiate details in a subject such as face, clothes, eyes, nose, etc … but there are large blocks of information that the system cannot differentiate because it does not have points of comparison or keys to identify the element. While in low resolution, large blocks can be separated as wall, sky, person, floor, etc … this is because the separation of the elements is more visible for the algorithms that analyze the image.
By feeding the model with the same separate image at different resolutions, analysis systems can map different blocks of information. First a block seen from the general, then a separation in the details. This generates two depth maps.
Those two depth maps are then analyzed by another algorithm that averages between the information to create the high resolution map that results in a more powerful depth image than any other similar model.
This breakthrough in image depth analysis could be vital to develop better tools in computational photography. Some tools like Abode’s fake bokeh can benefit from that process, and can even be used to create special adjustments to the image. Smartphones could use a mix between the ToF sensors and the algorithm to produce better results in the lighting settings of the elements, and thus it would not be necessary to implement additional tools such as the LiDAR sensor that Apple has in its iPhone 12 Pro.