"It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so."
-Mark Twain
Introduction
As an explorer of the latest in emerging tech, maintaining a healthy dose of reality—or skepticism—is crucial for evaluating whether a product truly matches its marketing. In many areas of life, the gap between marketing and reality can be significant, and discerning this difference is essential. The advancements in Large Language Models (LLMs) such as ChatGPT and Claude, as well as AI in general, over the past several years have been nothing short of astonishing. These advancements are not mere flukes; they represent intentionally scalable, compounding innovations that are now yielding significant returns on years of investment. One can almost visualize the exponential growth trajectory, building on the foundational work of pioneers in codebreaking, transistors, and expert systems, leading to the sophisticated LLMs we see today.
While the advancements in AI, particularly in Large Language Models, are undeniably remarkable, these capabilities are accompanied by a critical challenge that cannot be overlooked: the issue of trust. A recent The Verge article by Alex Cranz calls out the issue of trust (i.e. hallucinations) and the limitations of current Ai. For anyone who hasn't already experienced the wild misgivings of ChatGPT, hallucinations occur when LLM's "make stuff up". They either don't have the data, or statistically couldn't make a better prediction so it provided 'an answer'. To be fair, the LLM's have one job "always provide an answer". Now, while this has gotten better in recent months, nonetheless, LLM's are likely to eventually fabricate a response. So how does one work with a tool that lies every once in a while?
Let's start with Human Factors.
Trust
Trust is one of the foundational aspects of humanity. It is fundamental in that it directly influences our physiological and psychological responses. When trust is present, it helps mitigate the body's natural flight or fight response, allowing individuals to feel safe and secure which enables them to truly listen, act, and engage. Without trust, the brain remains on high alert, ready to defend against perceived risks, which hinders open-mindedness and willingness to adopt new ideas. This open-mindedness and willingness to try new ideas is critical for innovation to "cross the chasm". This has a compounding effect within human systems or groups as trust helps potential adopters believe in the benefits and safety of new technologies and is influenced by early adopters and trusted figures (Rogers, 2003; Mayer, Davis, & Schoorman, 1995).
Resistance
Humans generally resist change, and much of the time, they will create reasons not to change and may not even know it. This resistance to change is a well-documented phenomenon in clinical literature, with various underlying reasons. One significant factor is cognitive inertia, which refers to the brain's tendency to stick with familiar routines and practices due to the mental effort required to adopt new behaviors. This inertia is partly due to our brains being hardwired to conserve energy and avoid the discomfort associated with change (BioMed Central) (Welldoing). Change often introduces uncertainty, which the amygdala, the part of the brain responsible for fear responses, perceives as a threat. This perception can lead to a natural resistance to new initiatives.
Ego
"I know more than that thing, it's just predicting the next word. " - ego.
Ego plays a crucial role in the adoption of innovation like AI, as it influences how individuals perceive and react to new technologies. The ego, which encompasses our self-identity and self-esteem, often resists changes that threaten established roles, expertise, and status. When AI is introduced, it can challenge one's sense of competence and professional value, leading to defensive reactions. Neurologically, the brain's default mode network, involved in self-referential thoughts, may become more active, causing individuals to focus on potential threats to their self-worth (Short et al, 2020). This can result in cognitive biases, such as overestimating the risks of AI and underestimating its benefits. Ego-driven resistance can hinder openness to learning and collaboration, creating barriers to the effective adoption of AI and other innovations. Recognizing and addressing these ego-related responses is key to fostering a culture that embraces technological advancements.
Fear
Fear significantly impacts our openness to exploring innovation, especially regarding the concern that AI will take away jobs. When people fear losing their livelihoods to AI, the brain's amygdala triggers the flight or fight response, prioritizing safety and survival over embracing new technologies. This heightened state of alertness reduces cognitive flexibility, making individuals more resistant to change and less likely to engage with AI-driven solutions. Fear can also lead to increased skepticism and doubt, causing people to question the reliability and potential benefits of AI. Consequently, this emotional response can stifle curiosity, hinder creativity, and create barriers to the adoption and integration of AI technologies, ultimately slowing progress and development in various industries.
Intermittent Lies
Now, LLMs don’t just lie all the time, just sometimes. According to a variety of online sources and articles, how often they lie can vary widely, but let's stick with a broad range of 3-27%. Again looking at the human response, intermittent lies impact the relationship not just between humans, but also our relationships to tools, technology, and systems. Am I ok with this "intern of a chatbot" making up ~10% of the output that I now have to validate? Will this save me time, or take me more time to validate sources? These intermittent lies trigger responses in us that very human and difficult to understand in the moment, but well documented in the clinical literature.
Heuristic Processing
The danger not only exists in a lack of trust, it is also in too much trust. A book published in 2017 titled "Emotions and Affect in Human Factors and Human-Computer Interaction" includes the excerpt "Heuristic processing refers to constructive but truncated, low-effort processing, which is likely to be adopted when time and personal resources such as motivation, interest, attention, and working-memory capacity are scarce". The risk of using answers from A.I. is that people trust it because it's easy. One of the major reasons to use the tool is to reduce cognitive load. Validating the results increases cognitive load which some will avoid and let inaccuracies through.
The "fail"
While some will simply "trust the oracle" (as has been well documented since the release of ChatGPT), many will get frustrated by wrong answers, become distrusting, and simply not use it. In my work with integrating private data into LLM's, I have heard several folks say "it's cool, but it's just not there". The simple answer is because they cannot fully trust it. For example, RAG (Retrieval Augmented Generation) is an affordable way to incorporate private data into LLMs. Sometimes results are quite impressive, sometimes it can't find the information you are asking about. Imagine uploading an annual report into a prompt and asking it to "write me a summary of the salient points in the annual report". The results are underwhelming. It missed the most important point of the annual financial results and got a critical word wrong. This frustration, caused by our belief that it "should" be perfect can end poorly.
Probabilistic System
The thing to remember however, is that AI is a probabilistic system that leverages probability scores to evaluate the degree of accuracy in its predictions and decisions. By analyzing vast amounts of data, AI algorithms assign probability scores to potential outcomes, allowing them to determine the most likely and accurate results based on statistical patterns and learned information. It is technically not "certain" with 100% accuracy most of the time. Also cited in the Verge article is a study in which some researchers "don’t actually think hallucinations can be solved." Because LLM's are probabilistic in nature, one might assume that statistically speaking, it will always be (at least in some part) wrong.
Human vs. Machine
In the Verge article, Cranz points out the statement "Just as no person is 100 percent right all the time, neither are these computers." This brought up an interesting comparison, not just evaluating LLM's in a vacuum, but comparing LLM responses/work product to human responses/work product. Many tend to evaluate technology against a gold standard, but humans frequently fall short of this bar. By comparing LLM outputs to human responses, we can gauge the AI's effectiveness in real-world scenarios, assess its ability to meet or exceed human standards, and identify areas where it may still fall short. This comparative approach ensures that we are not merely impressed by the technology but are critically analyzing its performance in relation to the capabilities and expectations of human output. Such evaluations are not only helpful tools in evaluation of results, but help identify room for improvement in the models, or in the human operator requests.
Ok, it lies, now what?
Some may stop right there and throw the baby out with the bathwater. This is a typical problem with innovation, it's a reason we have concepts like the hype cycle, adoption curve, and other means of describing technology adoption. It is Amaras law in action "We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run". The skeptic thinks "I need it to do X, and if it can't do it, talk to me when it can". They then snuggle back into their comfortable groove along life's highway in the "how it's always been done" lane because it is known space.
However, the "wait for it to getter better" mentality is a mistake. Much of the time, things are not binary (it works or it doesn't), like much of life, Ai and LLMs consist of degrees. One need not look further than the last two years to see poor initial results of GenAi platforms that today are indiscernible from human produced work. The GenAi picture of the guy with 20 fingers 18 months ago now looks like it was taken by a photographer. ChatGPT 3.5 failed the bar exam miserably, while ChatGPT 4 aced it. Even these were not binary, they incrementally improved over time through training and reinforcement, one percentage at a time. It can be assumed, to a degree, that these systems will continue to improve incrementally and get better and better.
Openness
With that in mind, the importance then lies in the ability to not just stare at the present tech, but in the ability to, in the words of Wayne Gretsky, "skate to where the puck is going to be". This forward looking approach will require openness, humility, and an understanding of our own human responses in order to navigate these trust issues. This form of openness to new tools and exploration is how we got where we are today and in most practical cases in history has won the day over pessimism and fear. While humans are programmed to be skeptical (it's in our DNA and amygdala), we also need to be open to possibilities (more on the human factors of adoption in a future article).
Trust but verify
The means to cautious adoption could perhaps be to "trust but verify" as Ronald Reagan put it. In the short-term, these verifications will need to be done by us humans. In the long-run, systems of checks and rules will increasingly be built into platforms that incrementally comfort us that the appropriate verification is being conducted by the systems themselves and we can trust a point or two more each year (or month). When asked, ChatGPT 4o will provide citations where it's responses can be validated. Users can easily create personal frameworks for when to trust or not trust results. Ultimately, the produced works will continue to be the responsiblity of the human and as we check work produced by humans, as such we should validate findings from Ai.
Big Picture
Ultimately, we will need to address Ai's trust issue head on and navigate it steadfast with perseverance, knowing that Ai is a fundamental step in human evolution. We must evolve alongside Ai, as it is increasingly omnipresent, scalable, and an inevitable factor in our daily lives. This is not just important in the here and now as we navigate ways to trust an LLM to help us write a book, research an ailment, or help us design a strategy, but as part of a future strategy. Increasingly, Ai will be used in more and more critical areas of our lives including healthcare, transportation, biology, aviation, and much more (i.e. trust is pretty important when stepping into an autonomous vehicle).
As AI technologies, particularly Large Language Models, continue to evolve rapidly, it is important to approach these advancements with cautious optimism and vigilant verification. While the transformative potential of AI is immense, the inherent trust issues necessitate a balanced approach. By adopting a "trust but verify" mindset, we can harness AI's power while ensuring accuracy and reliability, especially in critical fields like healthcare and transportation. Fostering a culture of cautious optimism and vigilant verification will enable us to leverage AI effectively, driving innovation forward while maintaining necessary checks and balances. Ultimately, our openness and ability to address trust issues head-on will determine the successful integration of these powerful technologies into our society, balancing excitement with skepticism to realize AI's benefits and minimize risks. These capabilities mean not just considering how Ai works, but also to consider how us humans work.
Well stated, Nick! Lots of psychology and human emotions as trust is major component in any relationship, technology driven or not. Now with the advent of Generative AI many of us are using the output as gospel, when it really is a tool.
In some ways this trust problem has always been there with humans. Since the first story was told back in some cave thousands of years ago to countless misinformation websites published on the Internet.
However, as you stated, this can create more cognitive load to "trust but verify" almost to the point where people will just give up.
How do you suggest humans and us as technologists being able to help quickly sift through what's real and what's generated? What about bad actors that are actually purposefully generating mis/disinformation... How might we combat that? Seems like we might need tools to combat the tools. 😃 Curious to get your thoughts.