Wednesday, April 29, 2026

I Am Not Isomorphic To A Tree (It Only Has One Knee)

“If a tree had a knee, where would it be?”. This question was posed (and taken seriously) by Webb et al. (2023) in discussions of machine learning and visual analogy. This cloud looks like a unicorn. That is visual analogy. What does that even mean though? Because clouds do not literally look like horses with a horn, they look like giant puffy cotton balls in the sky, yet somehow the previous analogy can make sense in the proper context. Indeed, we can all likely think of a cloud that looks like a unicorn, it just needs a roughly horse-body shape, a head-like structure, and importantly, a horn. It won’t be picture perfect, but once you see it, you can’t unsee it. In others words, our brains can create abstract connections between objects so long as they have some kind of similarity in their structure, namely in the relationships between their components even if the images are materially different in exact shape, size, texture, etc..


Webb and friends (2023) wanted to change how me model visual analogical mapping with deep-learning, and in particular, get the deep learning out of it as much as possible. They proposed a hybrid model whose function is rather complicated, but in short, creates a graph (connected points) to represent the structure of each image abstractly, and then use an algorithm to find a mapping between them (I.e. match each dot in one images graph to a dot in the other images graph). They had great success with this model and beat out many of the other pure deep-learning front-runners, even beating humans in some trials. 





The problem with visiPAM however is a sneaky assumption that comes with their choice of graph-matching algorithm. In particular, they define a function to describe what “good” and “bad” matchings look like baked into this function, they assume that the resulting graph-matching should be an isomorphism (1-1 correspondence). In other words, their model assumes that every pair of graphs it encounters are (or at least almost) structurally identical. This is almost never the case in real life. Take our unicorn-cloud example from earlier. What if only the top half of the cloud looks like the top half of a unicorn? The graph representations of those images would not be isomorphic at all since half of the components of the unicorn are missing in the cloud, but again, meaningful analogy can still be drawn. 


The controversy over whether a 1-1 (isomorphism) assumption should be made has been discussed in the field for a while (Krawczyk, Holyoak, and Hummel, 2004;  Lu, Ichien, Holyoak, 2022; Hummel and Landy, 2009). Beyond the convenient albeit impractical assumption that an isomorphism should exist between two graphs, another criticisms is that when humans create visual analogies, we may do so in an n-to-1 manner i.e. we may map more than one feature in one graph to the same feature in another. The authors of Webb et al. (2024) posed the question “If a tree had a knee, where would it be?”. I pose the following: how many knees does a tree have? I would argue that a tree only has one knee.  No one is auguring whether or not trees have knees, any rational person can see that they do. However, a reasonable person would likely map both knees of a person to the same area of a tree, perhaps imagining the tree’s trunk as two legs squished together really tight to make one big leg. Another valid analogy, yet it is 2-1 (two legs map to one trunk). 


How do we fix this though? As a representative of the math community, I will tell you that the soft isomorphism constraint was made out of necessity for a stable algorithm that can produce at least reasonable output. Because without this constraint, visiPAM would like get worse, not better. This comes from the fact that in addition to the algorithm needing to be able to produce meaningful n-to-1 maps, it needs to know when it should. For example, the figure is an example mapping from cat to horse. Many parts correspond in a 1-to-1 manner e.g. ears, eyes, legs, tails, etc. But what about the cat’s toes? Really a horse foot is just a giant toe, so all of the cat’s toes should be mapped to the horse hoof, right?. This gets at another aspect of how humans do this, and what AI might need to do, which is to redraw the graphs themselves when the graphs are very different so that a matching becomes less ambiguous (Gentner and Forbes, 2011). Another complication is that the way that humans choose to map visual analogies is that humans also use context or goal-specific information when determining things, something that cannot be meaningfully done by automated systems yet. 


VisiPAM is exciting because it gives visual analogy a concrete computational body that is something other than another pure deep-learning approach. But that story is still too clean. Human analogy is often partial, pragmatic, and happily non-isomorphic. We merge things, ignore things, infer hidden things, and change the grain of comparison depending on what we are trying to see or do. So the next generation of visual analogy will likely not just benefit from better graph matching. It needs better graph making



References:


Gentner, D., & Forbus, K. D. (2011). Computational models of analogy. Wiley Interdisciplinary Reviews: Cognitive

Science, 2(3), 266-276. https://groups.psych.northwestern.edu/gentner/papers/gentner%26Forbus_2011.pdf


Hummel, J. E., & Landy, D. H. (2009). From analogy to explanation: Relaxing the 1:1 mapping constraint… very carefully. In B. Kokinov, K. J. Holyoak, & D. Gentner (Eds.), New frontiers in analogy research: Proceedings of the Second International Conference on Analogy (pp. 211–221). New Bulgarian University. https://labs.psychology.illinois.edu/~jehummel/pubs/Hummel%26Landy09Sofia.pdf


Krawczyk, D. C., Holyoak, K. J., & Hummel, J. E. (2005). The one-to-one constraint in analogical mapping and inference. Cognitive Science, 29(5), 797–806. https://doi.org/10.1207/s15516709cog0000_27


Lu, H., Ichien, N., & Holyoak, K. J. (2022). Probabilistic analogical mapping with semantic relation networks. Psychological Review, 129(5), 1078–1103. https://doi.org/10.1037/rev0000358


Webb, T., Fu, S., Bihl, T., Holyoak, K. J., & Lu, H. (2023). Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14, Article 5144. https://doi.org/10.1038/s41467-023-40804-x



No comments:

Post a Comment