Dr. Norberto Grzywacz, a very talented and impressive professional from Loyola University Chicago, gave a talk at the Neuroscience Seminar about his research related to reinforcement learning on social polarization. Our values is how we make decisions and these values come from the society we live in. As life goes on we are constantly receiving feedback from others. When we do something others like, we get some kind of positive feedback. While the reverse is also true, when we do or say something others do not agree with we receive negative feedback. This shapes our future decisions and actions. This is called reinforcement learning. Reinforcement learning can lead to polarization because we like the feeling of positive feedback. Therefore, we interact with the people that give us this feeling, and the loop continues until we are only interacting with people that are the same as us, creating a false reality that everyone else is exactly like us or should be. Something striking that Dr. Grzywacz mentioned was that artificial intelligence (AI) learns in this same manner.
This article written by Julian Horsey delves into more insight
of how AI learns in the same way and how human input impacts that learning. An
already trained AI is used in this process of reinforcement learning from human
feedback (RLHF). That means one that already has a large amount of data and
understands the foundations of human language. First it is fined tuned to
desired behaviors by giving the AI examples to guide it to correct responses.
Secondly, humans help the AI learn. This is done as AI completes tasks, someone assigns the
response with a corresponding number to label how desired the response was.
This is reinforcement learning in the same way humans learn as a higher number equals
a better response. Finally, the AI maximizes the desired responses based on the
numerical values assigned earlier. However, RLHF has some drawbacks. The main
ones are overfitting and bias. These lead to polarization within the machines
themselves. As humans who do the numerical values are inherently biased, the
machines can begin to give too narrow of responses or perpetuate stereotypes and
malicious biases inadvertently. RLHF does have its benefits of generously
increasing AI’s ability and scope to perform tasks in a more naturalistic manner
with more nuanced communication. However, RLHF must be done carefully, just
like in humans RLHF can lead to polarization and harmful outputs, but if done
right it can lead to improved AI that can maximize its benefits to society as a
whole.
Reinforcement learning is an incredibly interesting topic of
how the brain works to belong within society and be “one of the pack”. However,
it also highlights how humans can easily fall into the extremes. It is
fascinating how AI can emulate this behavior already acting more human-like
than ever before, signifying that we must remain cognizant of what AI is
becoming and the implications that it can provide for our society both
beneficial and detrimental.
References:
Grzywacz, N. M. (2025). Comparison of distance and
reinforcement-learning rules in social-influence models. Neurocomputing,
649, 130870. https://doi.org/10.1016/j.neucom.2025.130870
Horsey, J. (2024, August 8). AI reinforcement learning
from human feedback (RLHF) explained. Geeky Gadgets.
https://www.geeky-gadgets.com/reinforcement-learning-from-human-feedback/