It Depends Blog

I wrote this article for the Globe & Mail.

In the 1970s, portable calculators transformed our ability to do math. Today, generative chatbots such as OpenAI’s ChatGPT and Google’s Gemini are driving similar changes in our professional and personal work. Unlike calculators, chatbots can produce incorrect or fabricated responses, known as hallucinations. When humans and organizations uncritically use this untruthful chatbot content for tasks, it becomes what collaborators and I call “botshit.”

This content is a risk, because the large language models that underlie how chatbots work generate responses by predicting patterns of words that seem useful based on their training data. These predictions do not involve knowing the meaning of their responses. So, while the responses are sometimes coherent, useful and correct, they are often riddled with inaccurate content.

When an Air Canada chatbot gave a passenger incorrect advice about how to claim a bereavement fare for flying to a family funeral, the airline argued in court that it should not be held liable as the chatbot was a separate legal entity responsible for the veracity of the responses it provided. The court rejected this argument and ruled for Air Canada to pay the passenger damages and court fees. Similarly, a B.C. Supreme Court judge reprimanded a lawyer for submitting case documentation that contained ChatGPT hallucinations.

These examples clearly highlight the hazards of uncritically using chatbots to generate content for different types of work. To learn to effectively use chatbots we need to understand and mitigate the epistemic risks associated with this untruthful content. This involves chatbot users considering two questions when using the technology for work: How important is chatbot response veracity for the task? And how easy is it to verify the veracity of the chatbot response? The answers to these two questions lead to four different modes of chatbot work: authenticated, autonomous, automated and augmented.

When a chatbot’s response veracity to a task is difficult to verify, and the response veracity is unimportant, a user can use the chatbot in an augmented way. This means chatbot responses should not be used as a final input or output for a task, but rather as a trigger to help generate or refine ideas for tasks. For example, when asking a chatbot to suggest a title for a report or to edit a speech, the response should not be used as is but should be sifted through, questioned, edited and used to help enhance the professional in their work, rather than generate and spread false content.

When it is hard to verify the veracity of a chatbot response and response veracity is crucial, users would engage in the authenticated mode of chatbot work. A lot of legal, journalism, academic and health care work should not blindly use chatbot-produced content because of the harmful consequences that come from producing and sharing hallucinations. Users need to know the abilities, scope and limits of the chatbot they are using and do the necessary verification to ensure responses are factual and error-free.

When it is easy to verify chatbot outputs that are required to be accurate, the chatbot work can be automated. Users assign simple, routine and relatively standardized work to a chatbot that can be easily verified. Translation work, calculations, data analysis and scheduling are examples of automated chatbot work. The scope of the work and the large language models are limited and focused so it can be routinely trusted.

If a chatbot response can be easily verified, yet the response veracity is relatively unimportant, this would be an autonomous mode of work that is selectively delegated to chatbots. This work includes low-stake verifiable tasks such as routine customer inquiries or common administrative inquiries where the likelihood and resulting harm of sharing false content are both low.

In the 1970s, society did not have to worry about calculators producing incorrect responses, as their arithmetic logic and processing were unfailing. Chatbots do not have the same calculating reliability as they predict responses based on training data. For chatbots to be used effectively, their responses should be thought of as provisional knowledge that needs to be questioned, checked and edited in line with the veracity extractions of the work. Thus, mastering the use of chatbot-generated provisional knowledge requires practices to be tailored to the modes of chatbot work and associated hallucination risks.

Subscribe via Email

Tuesday, 2 July 2024

How to deal with the tsunami of AI-generated hallucinations

No comments:

Post a Comment