How to Limit LLM Hallucinations

September 13, 2023

What is an LLM?

To establish a baseline, we need to understand that generative AI is a broad field within artificial intelligence, primarily related to natural language and text construction. Large Language Models (LLMs) are a specific subset of generative AI, designed for task completions such as completing a sentence like “Humpty Dumpty sat on a wall, Humpty Dumpty had a great…”. The LLM’s function is to predict the next word, “fall”. The true value of LLMs lies in their linguistic understanding and the natural completion capability they acquired through training. Even though we appreciate the extensive content and data LLMs possess, we should not overly depend on the factual nature of this content and data. The challenge now lies in ensuring that LLMs do not present themselves as authoritative or factual. But it’s important to remember that LLMs are designed to hallucinate, and preventing them from doing so would be counterproductive. They are, essentially, narrative constructors rather than fact machines.

LLMs are Designed to Hallucinate

LLMs are designed to hallucinate following a cognitive linguistic path or prompt set by a user or another system. They construct a plausible completion for a given prompt. The better the prompt, the better the completion or generation. Therefore you need to construct a method to provide better prompts with more real-time information to an LLM for more accurate answers. LLMs are not knowledge management systems and were not designed to be, despite containing a wealth of facts from their training. However, these facts should not be the primary source of information in most business contexts as they are not curated according to an organization’s specific needs.

How to Limit LLM Hallucinations

To effectively limit LLM hallucinations, you need to treat LLMs more like journalists instead of storytellers. Journalists weave stories using factual, real-time information. Similarly, you need to feed your LLMs with real-time information to ensure their generated content aligns more closely with reality. Storytellers on the other hand don’t require real-time information, since they are creating tales in a fictional world.

In our experience, LLMs are primarily used to distribute static content, but static content queries only cover about 20% of the answers users seek. The majority of queries require real-time information. For instance, asking for a current cash forecast at 10 a.m. on a given day will have a different answer hours, if not minutes, later. Or, “Which sales opportunities have a chance of slipping into the next quarter?” Answers to these questions reside in your finance and accounting or customer relationship management software systems. You can’t train an LLM on this fluid data but you can prompt an LLM with the data from these systems by integrating an LLM with your backend systems. This in essence will limit LLM hallucinations since you are prompting it with your real-time data to generate an answer that contextually makes sense to the person asking the question given they have permission to read the data.

Moreover, when individuals pose questions like these, they often aim to initiate a whole workflow. For instance, the sales opportunity question mentioned earlier about which deals may slip, cannot be resolved by referencing static content. Such requests will involve triggering other systems or human workflows that fall outside of an LLM’s purview. A sales manager or chief revenue officer will seek to initiate some type of action if deals are slipping so they can maintain the sales forecast. They may want to offer a discount to accelerate a deal or offer an incentive if excess inventory is available.
It’s essential to realize that while LLMs are an important part of the solution, they are not the entire solution. To handle queries related to static content, real-time information, and integrated systems or workflows, you need to marry LLM capability with other systems. With this integrated approach, you can limit LLM hallucinations, ensuring the AI system provides more accurate and beneficial responses.

LLM Deployments Require Agility

Generative AI and LLMs quickly innovate. Therefore, it is important to become more agile and flexible in implementing LLMs. This is not akin to an ERP implementation where you commit to a solution and then stick to it indefinitely. LLMs and AI are perpetually evolving, meaning the implementation cannot be a one-time action, rather it necessitates regular revisiting and refining. It is important to note that this is not a simple decision at all. As a significant number of our customers have come to understand, this is a complex process.

The analogy we often use to illustrate this is the process large organizations undergo when deciding to choose an ERP from Oracle or SAP. After months of agonizing over this decision, organizations generally avoid even looking at the ‘other guy’s’ website or marketing materials for years. They stick with their choice because these changes can be challenging to manage. While some conglomerates can handle such changes, the vast majority of organizations prefer to stick with their original decision.

However, the situation with LLMs is entirely different. You cannot evaluate LLMs once and then stick with that decision indefinitely. There are multiple LLM providers. Each LLM provider offers several options, each with its own parameters. This situation already presents a complex matrix to navigate even before considering a second LLM provider.

Factors such as data security, performance, cost, and accuracy all need to be taken into account when selecting an LLM. Thus, the decision often comes down to specific use cases. For instance, we have a customer who conducts research on highly sensitive and specific information. As a science-based company, its goal is to provide its researchers with immediate, real-time access to a conversational set of content. For this purpose, an LLM is perfect.

However, the content needs to be in real-time and self-hosted because their ongoing basic research cannot be moved to the public cloud. Simultaneously, the same researcher can converse with an LLM using curated content from a third-party vendor or a publicly hosted cloud-based model. In essence, it is perfectly acceptable for an LLM to be powered by three different LLMs – one self-hosted, one from a third-party vendor specific to the industry, and a publicly hosted model. Each of these LLMs has different security and accuracy characteristics, but combined, they can achieve the intended purpose.

How to Implement Multiple LLMs

Implementing multiple LLMs requires a strategic and comprehensive approach taking into account a variety of factors, including cost, accuracy, agility, and most importantly, data security. For example, you might have several options that vary in cost and accuracy. While certain options may rank highest in terms of cost, another option with a lower cost but comparable accuracy might be more suitable. Therefore, the process of implementation involves optimizing the entire framework, instead of focusing on individual elements.

Krista offers an excellent framework to deploy LLMs to address your static content, real-time integrations, and process workflows. Krista provides a conversational user interface that people already comprehend and enables subject matter experts to build automations in a “nothing like code” studio. Most important is Krista’s “human in the loop” functions that can be incorporated into workflows bringing human judgment and expertise into a streamlined process as your AI journey progresses. Perhaps the most significant advantage of Krista is that she offers the simplest and most agile method to integrate any AI into your business. This agility is essential in the fast-paced, constantly evolving world of AI, where businesses need to adapt quickly to new developments.

Contact us today about how Krista can quickly deploy LLMs, AI, or conversational automation for your business.

Links and Resources

Speakers

Scott King

Scott King

Chief Marketer @ Krista

John Michelsen

Chief Geek @ Krista

Transcription

Scott King:

Hello, everyone. Thank you for joining the latest episode of the Union Podcast. I’m your host, Scott King, and with me is the Chief Geek of Krista Software, John Michelsen. How are you doing, John?

John Michelsen:

I’m doing well, Scott. Thank you. We appreciate the positive feedback that we receive from our podcast listeners and viewers.

Scott King:

Absolutely. We value the insightful comments we receive and are always open to suggestions on what you’d like us to cover. If you have any ideas or questions for John, Chris, Sam, or any of the other podcast contributors, please send them in. Today, we’re discussing Large Language Models (LLMs) and their tendency to generate inaccurate or hallucinated content. Various reports and articles state that LLMs are unreliable, and we aim to explore why this is the case and how companies can mitigate these issues.

John Michelsen:

Indeed.

Scott King:

So, John, I believe it’s crucial to increase the accuracy and comfort level of using these models, given the concerns surrounding inaccuracy and cybersecurity. Could you provide us with a brief overview of what a Large Language Model is and why they’ve become so popular?

John Michelsen:

To establish a baseline, generative AI is a vast field within AI related to natural language and text construction. Large Language Models (LLMs) are specifically designed for completions, such as completing the sentence, “Humpty Dumpty sat on a wall, Humpty Dumpty had a great…”. The next word is what an LLM is designed to generate. They predict the next word, a core capability that is the equivalent of a computer chip’s binary on-off, zero-one. In essence, they are designed to hallucinate, following a cognitive linguistic path or prompt set by a user or another system. They construct a reasonable completion for a given prompt. They are not knowledge management systems, and were not designed to be. LLMs contain a lot of facts from their training, but these facts should not be relied upon in most business contexts because they are not curated according to an organization’s specific needs. The value of LLMs lies not in the facts they contain, but in their linguistic understanding and the natural completion capability they developed through training. It’s important to note that while we appreciate the extensive content and data LLMs possess, we should not depend on the factual nature of this content and data. Now, we’re faced with the challenge of ensuring that LLMs do not present themselves as authorities or facts. However, we should move away from the idea of preventing LLMs from hallucinating because that’s how they are designed. I like to analogize LLMs to storytellers, not hallucinators.

Scott King:

For instance, when I asked an LLM to tell me a story about Jerry, a rookie baseball player leading the league in stolen bases, it generated a story of about 150 to 200 words. However, the LLM added extra details I didn’t ask for, such as making Jerry play for the Yankees.

John Michelsen:

That’s the power of LLMs. They can add depth to a simple prompt, creating a rich and engaging narrative.

Scott King:

Indeed, while it was a great story, it contained unnecessary details.

John Michelsen:

Yet, that’s the essence of their power.

John Michelsen:

By the way, Scott, you didn’t need to prompt it to tell a story. It would have told a story regardless. If you had simply started with, “Jerry is a rookie baseball player leading in stolen bases,” it would have picked up from there. We are trying to transform it from just a storyteller to a journalist. We prefer the term ‘journalist’ over ‘hallucination’. When you surround this AI with an elegant orchestration of people, systems, and real-time information from a knowledge management system, it becomes a journalist. That’s our goal. A journalist tells the story based on actual, relevant facts. They don’t rely on cached results from years or even weeks ago. When we integrate the AI’s text completion capability with the journalist’s research skills, we create a journalist that tells stories based on the facts we need at that moment.

Scott King:

Even journalists update their articles when they learn new facts. So, how does that work? Journalists write interesting pieces based on the facts at hand. However, what’s accurate today might not be accurate tomorrow.

John Michelsen:

Yes, and the challenge here is that, in our experience, perhaps 20% of all queries people want to ask an AI-based system are reflected in static content that could be found if they had performed a search. Most solutions, which aren’t very sophisticated, are merely making this information available to an LLM, hoping this particular content gets distributed. However, it only covers about 20% of the answers. The classic, yet repeatable challenges we’ve faced, even though we’re only two and a half years into this journey, are that humans want real-time information. For instance, they may ask, “What’s our cash forecast?” Here, at 10 a.m. on any given day, it’s X, but it’s going to change within hours or days for sure. LLMs are primarily used to distribute static content, replacing the difficult internal search in the shared file system that nobody uses. There’s value in this, but it’s limited to about 20%.

Your point about what the journalist can’t typically cover relates to access to real-time information. Often, when a human asks a question, they are trying to initiate a whole workflow. For example, if I say, “I need next week off,” it’s not something to look up in a static piece of content. If there is mention of it in a static piece of content, it’s the process that needs to be followed for it to be carried out, which doesn’t happen merely by stating it on a screen.

What we’ve consistently seen is the potential for an LLM to act as a journalist if it’s properly prompted with good information and tells that story well. But even before that storyteller gets a chance to do so, a request might need to invoke several other systems, trigger human workflow, or do other things. It might be a request for real-time information, and it may not even involve an LLM. For instance, you, Scott, might want to know how many leads came to the website today or how many leads are active in our system right now. Those are really questions for natural language understanding (NLU), not for LLMs.

We are tending to frame everything as an LLM solution, when in fact we need this capability to be married with other abilities for a complete solution. This way, we can answer all three types of questions – those related to static content, real-time information, and integrated system or workflow. All three can be accomplished, and while the LLM is part of it, it’s not the whole story.

Scott King:

Indeed, it appears that what you’re discussing is different from what I see online while researching for the podcast. Many people seem to think that they need to train their own Language Models (LLMs). They believe it should know the answer, but in reality, it needs to read the answer and generate the output for the user. So, what are some best practices? If I think I need to train an LLM and I need to shift my thinking, what should I really be doing?

John Michelsen:

Yes, I hope that those who are talking about training are using the phrase in a loose sense or referring to something similar to what we’re discussing because…

Scott King:

Or perhaps they mean tuning, right?

John Michelsen:

Yes.

Scott King:

Because tuning is different, we’ll talk about that in a moment.

John Michelsen:

Absolutely.

John Michelsen:

Yes, it might be more appropriate. But even there, you would be adding to the corpus of information in that Language Model (LLM). This will make understanding whether it’s hallucinated or not much more difficult. Training an LLM from scratch is a massive undertaking that is geared towards large tech organizations and research groups, but it’s not for most in the commercial world. The capability of synthesizing text into relatively appropriate completions is proven through a dozen different LLMs that we are working with here at Krista, which we make available to our customers. We use these capabilities ourselves as we enhance our own product. So far, we haven’t found the need to build a model. In most cases, we don’t even need to fine-tune. Fine-tuning is often necessary when you want to structure outputs differently. But what you’re really looking for is real-time prompting with accurate information, Scott. If people mean training when they say that, they’re in for a challenge where the cost can surge into eight figures. Fine tuning mashes your information against incorrect information, and then you try to figure out if it only gave you your information, which is actually more challenging. I don’t know if I’m delving too much into the details here, but we have some great ways to help our customers who are using Krista with their LLMs. We ensure that the LLM’s storytelling ability didn’t stretch the truth. For example, if we prompted it with a set of actual facts and said, “Now construct an explanation of how Jerry did in his first year as a baseball player.” We would then immediately make the LLM prove to us that it didn’t fabricate things about a specific game or anything that didn’t actually happen. We have the ability to keep the LLM within its lane, to remain factual and avoid fabricating stories. If you fine-tune, you actually lose the ability to do that. And of course, if you’re training from scratch, then you’re in a whole different ballpark. So that’s a long explanation, and I hope at least some folks who are watching or listening have understood where I’m going. To summarize, use off-the-shelf models. They could be on-prem, self-hosted, or cloud-based, but you’ll need to prompt these in real time. There’s a significant amount of engineering you’ll need to do to get it right, and that’s just 20% of it. The other 80% involves real-time information and invoking workflow or other types of integrations. All of these contribute to a great AI Q&A.

Scott King:

Yes, the point is that it’s possible. You just need to approach it correctly. If you’re attempting to train your own system, there won’t be enough AI engineers to handle all the tasks.

John Michelsen:

Indeed, if everyone is building one.

Scott King:

Exactly. So, where is this leading us in the future? I appreciate our discussion on real-time accuracy because what we assert today might not be correct tomorrow.

John Michelsen:

Absolutely, that’s right.

Scott King:

But considering the current status of how the Language Models (LLMs) are functioning, and the rapid pace at which they’re evolving, they’re continually outperforming each other. They’re improving at an astonishing rate. You really don’t want to be confined to one model. So, where is this heading? And how do we address the concerns about hallucination and inaccuracies in the future? What does the next six months hold?

John Michelsen:

Yes, that’s a good question.

John Michelsen:

Many of our customers have come to understand that this is not a simple decision. The analogy I often use, and pardon me for those who don’t agree with these companies, is that large organizations often spend months deciding whether to choose an ERP from Oracle or SAP. They agonize over this decision and, once made, they avoid looking at the ‘other guy’s’ website or marketing materials for years. They stick with their choice because it’s challenging to manage such changes. Although some conglomerates can handle these changes, most organizations prefer to stick with their decision. However, the situation with LLM is entirely the opposite.

We won’t be in a situation where we evaluate once, in say 2023, and then stick with that decision. Because with LLM, there isn’t just one option. Any LLM provider has several options, and each of those requires parameters. So, even before you consider a second LLM provider, you already have a complex matrix to navigate. Factors such as data security, performance, cost, and accuracy all need to be considered when selecting an LLM. So, the decision often comes down to use cases.

For example, we have a customer that conducts research on highly sensitive and specific information. They are a science-based company, and their goal is to provide their researchers with immediate, real-time access to a conversational set of content. In this case, an LLM is perfect. However, the content needs to be in real-time and self-hosted because they cannot move their ongoing basic research to the public cloud. But at the same time, that same researcher can have a conversation with an LLM using curated content from a third-party vendor or a publicly hosted cloud-based model.

It’s perfectly fine for an LLM to be powered by three different LLMs – one self-hosted, one from a third-party vendor specific to your industry, and a publicly hosted model. Each of these LLMs has different security and accuracy characteristics, but combined, they can achieve the intended purpose.

You might think this sounds like a million-dollar project, but it’s not necessarily the case. For this particular organization, it is a multi-million dollar project to deploy all the AI and LLMs they need. However, each individual use case is not that costly, provided it’s done correctly with the right thought and tools.

The capability to achieve impressive results is definitely there. We just need to approach it with a level of sophistication beyond the notion that simply feeding an LLM with lots of Steve Jobs quotes will make it sound like Steve Jobs. That’s an illusion. What is needed in this case are three different LLMs, each with its own unique security and accuracy characteristics, that together achieve the desired outcome.

Scott King:

Previously, I did an episode with Luther where we discussed the cost aspect. Because, John, you briefly touched upon costs. We delved into how with higher accuracy, the cost escalates and one might end up spending excessively for just a slight percentage increase in accuracy. So, for those of you listening, I’ll provide a link to that episode because it provides a better understanding of this topic.

John Michelsen:

Yes, we’re currently in the process of building what I would call the first generation of a comprehensive framework. For instance, you might consider costs. There might be options that rank highest, second highest, and third highest in terms of cost. Alternatively, you might have an option with a much lower cost, yet comparable accuracy. In this case, the latter may be your preference. Therefore, optimization involves considering the entire framework, rather than focusing on a single element. You need to integrate your specific sensitivities into the selection process. This is exactly why solutions like Krista are useful; they elegantly orchestrate people, systems and AI without the need for complex coding.

In our case, we apply an elimination method similar to the March Madness basketball tournament for LLMs. Out of the initial 30 options, only one will emerge as the ideal choice. The software does most of the heavy lifting here. We specify our preferences in terms of cost and accuracy, as well as non-negotiables related to data security for certain data sets. Then, based on a specific use case, the software chooses the most suitable option. For a different use case with different characteristics, the software will suggest a different choice. It’s quite fascinating to observe this process, especially in a market that’s just beginning to understand the concept.

In the next six months, I believe we will see more companies understanding and implementing this approach, especially those who are seriously deploying these systems. For those who are currently paying the maximum price for every single prompt to their LLM providers, they will soon face a bill shock, reminiscent of the early days of cloud computing when companies realized they were paying more than they would for running their own data center. This will necessitate the development of tools to manage LLM costs. By adopting a thoughtful approach to LLM selection from the start, we can prevent such problems.

It’s not a bad idea to use multiple providers, as this offers a level of insulation from individual providers. I always advise against hardwiring yourself to any of these APIs as you’ll need the portability to switch providers if necessary. Some providers may fail, or as we’ve seen, certain countries like Italy may impose restrictions or outright bans on them.

The point is, you don’t want to be in a place where you need velocity of change and you can’t change anything. When you wire this stuff together too tightly, that velocity of change just got taken away from you. So, I know we’ve sort of veered off the topic of hallucination here, but I absolutely and sincerely believe that those who best exploit the capability that LLMs provide them are going to massively transform major areas of their business. To do so is going to take some sophistication, some thought, and some new thinking. So, I hope, for all of your sakes, that’s exactly where you’re headed and exactly the kinds of things you’re doing.

Scott King:

That was outstanding, John. Yes, you did hallucinate a little bit, but it made sense, right? It was a great story, especially when we delved into Italy. But, contextually, you went there, but I think that just shows how the software is going to act in the same way. So, I really appreciate your time today. Until next time.

John Michelsen:

Sounds good.

Setup Guide: How to Deploy AI to Respond to Emails

Close Bitnami banner
Bitnami
Close Bitnami banner
Bitnami