A property of human visual perception that is far beyond the ability of artificial intelligence today is the human reliance on configural relations between the local features of a shape in object recognition. While artificial intelligence is largely able to recognize objects in the same accuracy as a human could, the flaws lie in AIs inability to see images as a whole, rather than a bunch of local parts combined to make a whole. Say for example, if parts of an image were to be scrambled up, a human would probably be unable to recognize the shape as the whole picture is disrupted, but an AI model would be able to as they recognize local parts rather than the whole picture. In the paper, “Deep Learning Models Fail to Capture the Configural Nature of Human Shape Perception” by Nicholas Baker this concept is explored by testing the accuracy of image perception of humans versus that of a computer model when looking at a scrambled image.
The key findings of Baker’s research were that humans were significantly better at accurately recognizing objects when their parts were whole, but this accuracy plummeted when the parts of the image were scrambled. In contrast, the computer model was consistent in its ability to recognize the objects, both whole and scrambled. It was also found that the computer model had a bias towards local textures and contours, meaning that because the scrambled images had the same contours and textures as the whole image the computer perceived them as the same. While this bias towards texture aided the computer model in identifying scrambled objects, it also presents a weakness as an object with the texture of a different object over it could be falsely identified as the textures object. Another finding was that humans and computers analyze shape differently. Humans largely rely on the skeletal representation meaning we have and understanding of how parts of an image should connect while computers lack this understanding.
Baker's research reveals the weaknesses of current computer models' object recognition abilities, but also potential for these models to progress and grow. In a world where AI is becoming increasingly popular in methods such as face identification in electronics and even to a level of confidence that it is used as identification at airports. The difference between these types of facial recognition and human visual perception is that these forms of object recognition are of a fixed position. A study from Dartmouth College “Modeling Naturalistic Face Processing in Humans with Deep Convolutional Neural Networks” investigates whether the same AI model from Baker’s study, Deep Convolutional Neural Networks, are able to process human faces to the same ability as a human, especially when the faces are dynamic. This study used a data set of videos displaying dynamic human faces and compared how accurately they were able to be sorted into categories by the AI model and humans as well as tracking neural activity. The findings were that the AI model was similarly able to sort the faces into categories such as gender, ethnicity, and age as the human participants were. It was also found that the AI model disregarded dynamic information of a face such as change in facial expression while humans use this information to aid them in identification. The overall outcome of this study revealed that current artificial intelligence models are poor at capturing dynamic information.
While recent advancements in artificial intelligence have been quite rapid, it is clear that current AI models have a long way to go to meet some of humanity's strengths such as object and facial recognition.
No comments:
Post a Comment