Earlier this semester, I attended a guest lecture with Professor Nicholas Baker. The lecture discussed the article "Deep learning models fail to capture the configural nature of human shape perception," where Baker and colleagues demonstrated that deep learning models don't possess the same local image recognition perceptual abilities that humans do. Although advanced, deep learning models struggle with shape recognition (as opposed to their color and texture recognition capabilities). Differences in this dimension are evident through these models ability to consistently differentiate images, even when objects components are arranged in non-conventional manners. The article demonstrated this conclusion by asking both humans and AI whether animal silhouettes that were jumbled up represented the original, un-jumbled animal in question. Humans struggled to answer correctly. The deep learning model did not. On the other hand, models struggled when silhouettes were slightly blurred, but humans did not. These results indicated that these models lack the ability to accurately categorize objects when finer details are obscured or abstracted. AI seems to lack the ability to interpret perceptual cues as greater than the sum of their parts in a fashion similar to humans.
To gain further insight into human image recognition capabilities compared to AI, I read "Hierarchical Structure in Perceptual Representation," an article that details the hierarchical image recognition capabilities that humans have adapted to possess (Palmer, 1977) . In it, Palmer determines that general object characteristics are grouped into discrete structural units. These units contain units within themselves which then themselves contain multiple structural units. Analysis of the proximity and grouping of these like characteristics gives insight into a given objects category. In contrast, AI process appears to be more bottom up, where smaller details are recognized as the main source of category indication over broad structural units.
Baker, N., & Elder, J. H. (2022). Deep learning models fail to capture the configural nature of human shape perception. Iscience, 25(9).
Palmer, S. E. (1977). Hierarchical structure in perceptual representation. Cognitive psychology, 9(4), 441-474.
No comments:
Post a Comment