How AI is Improving Document Understanding

September 25, 2024

Today’s businesses face a massive challenge—managing and extracting insights from mountains of documents. From HR policies to invoices and contracts, manually sifting through this data wastes time and resources. AI accelerates your business and saves your people valuable time. By using AI to automate document understanding, businesses can answer questions faster, streamline workflows, and empower teams to focus on high-impact tasks. In this article, we will explore three key AI-driven document understanding use cases that can transform how your organization handles its data.

AI is More Than Generative AI

Stop thinking of AI as just generative models like ChatGPT. Generative AI looks like the solution for everything, but in reality, it’s only the last 5% of the process. For years, businesses have used predictors and categorizers to automate workflows and drive real business value.

  • Predictors help your team forecast outcomes, such as predicting energy consumption based on temperatures.
  • Anomaly Detection flags unexpected patterns so your business can react faster to outliers.
  • Categorizers automatically sort and classify requests, such as customer support tickets or internal inquiries, to resolve them faster by routing them to the right process.

Unlock the Three Key AI Use Cases for Document Understanding

To effectively handle document-driven processes, AI can be applied in three distinct ways. Each use case focuses on different business needs and improves how your organization manages its data.

  1. You Don’t Know What Users Will Ask

AI uncovers answers to vague or poorly phrased questions by interpreting user intent. Users no longer need to be experts at phrasing queries or relying on exact keywords—AI understands the context and delivers answers based on the available data in documents.

  • Example: When employees need answers from policy documents or HR guidelines, AI surfaces relevant information based on context, even when users don’t know the exact questions to ask.

2. Known Questions with Variable Answers

When your team knows the specific questions but the answers vary across different documents, AI retrieves the precise data needed. This use case saves hours of manual searching and ensures accuracy when extracting answers from various document formats.

  • Example: AI can help your team process invoices and contracts, quickly pulling out key data like shipping addresses, order numbers, and pricing from documents with different structures.

Real-World Example: Speed Up Invoice Processing with AI

In a recent project, a large European healthcare organization processed 10,000 invoices in multiple languages using AI. The AI extracted key data—even from invoices that lacked vendor names—and achieved over 85% accuracy in its first run, far outpacing human teams. Use cases like this prove that AI doesn’t just reduce manual effort—it transforms operational efficiency and scales your capacity. This is an example of how AI handles known questions with variable answers, such as processing different invoice formats while maintaining high accuracy.

3. Automating Workflows with NLP

Natural Language Processing (NLP) allows your users to interact with systems naturally, cutting through rigid data formats and allowing faster, more intuitive access to information. It goes beyond chatbots by powering business operations and extracting valuable information from unstructured sources like emails, forms, and reports.

  • Example: NLP can pull shipping dates, order numbers, or customer details from emails without manual input, making operations more efficient and reducing errors.

Implementing NLP and Precision Techniques for Smarter Automation

To achieve precise results, AI uses two key methods for processing text:

  • Lexical Analysis focuses on extracting specific data points like dates, product codes, or order numbers.
  • Semantic Analysis helps AI understand the meaning behind those data points and the context in which they appear, ensuring that the system interprets complex queries accurately.

By combining both approaches, you can ensure that your AI system delivers accurate insights and answers across a wide range of documents. This combination allows your organization to handle complex document structures like tables, contracts, or multi-page forms, making document management seamless.

Handling Complex Document Structures with AI

Your business deals with various document formats—tables, forms, free text, and more. AI handles these complexities effortlessly, ensuring that key information is pulled from the right sections. For instance, health plan comparison documents, which are often difficult to interpret, become easy to navigate when AI differentiates headers from data points and ensures the correct information is extracted. Whether managing contracts, invoices, or policy documents, AI gives you the flexibility to scale document processing while maintaining accuracy.

Prioritizing Security and Access Management

As you automate document understanding, ensure your AI system prioritizes data security and privacy. Role-based access control ensures that only the right people have access to sensitive information, and location-based policies help your business comply with regional regulations. AI must go beyond just answering questions—it must enforce security protocols and keep your business compliant.

Transform Your Business with AI-Driven Document Understanding

AI isn’t just a tool—it’s the key to transforming how your organization manages documents and handles data. By automating document understanding, you eliminate inefficiencies, reduce manual workloads, and empower your team to deliver faster, better results. From invoice processing to automating complex workflows, AI helps your business scale, stay competitive, and work smarter. Invest in the right AI platform today, and unlock the full potential of your organization’s data.

In our next episode, we’ll explore how AI can answer unknown questions from your documents while addressing security and privacy challenges. Stay tuned!

Links and Resources

Speakers

Scott King

Scott King

Chief Marketer @ Krista

John Michelsen

Chief Geek @ Krista

Chris Kraus

VP Product @ Krista

Transcript

Scott King
Hey everyone, I’m Scott King. Thanks for joining us for this episode of the Union Podcast, where we talk about AI, technology, people, systems, and how to get more done with technology. Today, I’m joined by the usual suspects. Chris Kraus, our product manager. Hi Chris. And we’re privileged to have John Michelson, our Chief Product Officer, or Chief Geek, joining us. Hey John.

Chris Kraus
How’s it going, Scott?

Scott King
Today we want to discuss common AI use cases with documents. How do I get AI to help disseminate knowledge and answer questions based on my content? John, generative AI has gained traction, and companies are buried under mountains of documents and data. The data resides in various places, but give us an idea of these three different use cases we’ll talk about, and why this is a big deal in helping people find intelligent answers.

John Michelsen
Big question, Scott, but the simple answer is we haven’t done very well with this historically. There are many knowledge management systems, and they’re often silos. Enterprise data is stored in systems of record, and that needs to be part of the solution. These flat files get stuck in folders, creating a big mess. The good news is we can use a combination of techniques and technologies, including generative AI, to create an elegant solution and raise the bar in people’s ability to gather knowledge or get answers quickly.

Scott King
What’s the real challenge? We have all these documents, but finding them is one issue. Even when trying to search in our OneDrive, it was a nightmare. Is the challenge about search, or is it about asking the right question? Is it a prompt engineering issue? What’s the main challenge when the information exists, but I can’t get an answer?

John Michelsen
There’s a lot to unpack. You can think of this as finding one document at a time or searching for the right document. But that probably won’t give you the best answer to the general problem. Before you know what document you’re looking for, you have to ask a good question, and asking good questions is sometimes more challenging than we think. People imply context easily. For example, if I say “recently,” everyone might have a different span of time in mind. I might mean the last few days, while someone studying evolutionary geography might consider “recent” as eons ago.

It’s tough to ask good questions, but we need to be good at that. Technology can help by making vast amounts of information available for us to ask questions. We also have to solve the garbage-in, garbage-out problem because no one has perfectly clean data. Everyone has holes in their data. Fortunately, we have techniques to handle that. We can curate content, ask good questions, and get valuable answers. This is one of the three styles we’re talking about — making content ready to provide answers even when we don’t know the questions yet.

Another style is when we already know the questions we need to ask, like with contracts or invoices. In this case, AI’s role is less about curating content and more about ensuring we get correct answers, despite the unstructured nature of the documents. Even if they are structured, they come in various forms. So, there are different challenges to solve, particularly on the document understanding side versus the Q&A side for the first use case. And to complete the picture, there’s a third style related to natural language capabilities as we build out a full orchestration. When we need to make rich use of language in an automated scenario, we rely on several natural language processing capabilities like entity extraction and parsing of information. These three are the different types of tasks we’re talking about.

Chris Kraus
Yeah, John, if we think about it, these are things we’re seeing with customers, very reusable patterns. This isn’t something we dreamed up in a silo. We have multiple customers with these three types of problems. For instance, they know what answers they need and what questions will be asked. Chatbots have done a good job with that. In the past, people tried solving this with chatbots following traditional logic, like the call trees we learned about in the ’90s. If you answer this, change that, and so on. But call trees only work for very specific tasks, like checking the status of something. When you try applying that on a website, it gets frustrating. It offers three options, but none are what I need.

John Michelsen
You’re right. It’s a really interesting topic. In almost every customer scenario we’ve seen, whether it’s customer service, internal support, or others, there are a set number of tasks representing about 80% of the work. You can pre-build your system to handle those tasks, like call trees tried to do. But then there’s the long tail—an infinitely long 20% of edge cases.

Chris Kraus
Yes, that’s the challenge that exists.

John Michelsen
And how often do we find ourselves in that 20%, right? But there’s an opportunity to provide enough content to cover more than 80% of those answers, and then orchestrate real outcomes from them. You can leave humans to handle the final 20%, and continue to train and retrain AI to remain effective at covering the vast majority of cases.

Scott King
Yeah, the long tail 20% is definitely my experience. I’m always the guy asking the exception, or getting the feedback that no one’s asked that question before. How has no one asked that, right? It’s really frustrating. Another use case you mentioned, outside of chatbots and their problems, is the known questions. These are situations where I know the question but not the answer. Think of it like a form—I have data to collect, but I need to retrieve it. How are AI assistants or what type of AI is helping companies with that? I can think of examples like an RFI as a simple example.

John Michelsen
Yeah. Well, not particularly simple in the case of an RFI. We’ve had AI techniques for this for a long time—OCR, for instance. But surprisingly, OCR never got great. It’s continued to improve, and some models are better than others, but it never reached an exceptional level. Even today, we apply a variety of AI techniques to read unstructured or image data, tables, figures, and even multimedia like audio and video. It takes a combination of AI techniques and specialized applications to use that data effectively. But we can now tackle use cases that weren’t possible even 12 months ago. And by “we,” I mean the industry, not just Krista. We’re doing projects today that I’ve never been able to say yes to before, and that we’ve been asked to do for decades.

Scott King
So you mentioned an example the industry couldn’t do as little as a decade ago but can now. What were you thinking about?

John Michelsen
Here’s an actual case in point. A large healthcare organization in Europe asked us to handle their inbound invoices and accounts payable. What makes that interesting? Well, thousands of invoices. We were given 10,000 invoices and asked to show how effective Krista could be. They were in seven different languages, and VAT/GST rules varied by region and were coded differently. And, interestingly, many vendors didn’t even put their names on the invoices—just a graphic of their logo. Many logos don’t even include the company name. So, could we “magically” read all these invoices, enter the data into the AP system, and follow all the business rules about vendor restrictions, pricing, and so on? They had a lot of business rules, which you’d expect.

They needed us to be effective because this process is expensive and time-consuming, and it’s hard to keep staff willing to do the amount of grunt work required. Leveraging a combination of techniques, not just one, we achieved 85-87% accuracy on the first run. That’s the beginning of the journey, but it’s higher than I would have proposed as a target. We aim for the high 80s or 90s, but to hit that accuracy right from the start was impressive.

We spot-checked against the standard they provided, which allowed us to compare. We knew we hit 87%, but interestingly, when we compared our effectiveness to their staff, the humans were only accurate about 62% of the time. So Krista was more accurate and did the job a thousand times faster, without having to convince people to do tedious work that leads to high turnover—66% per year, in this case. That’s a project I could never have said yes to until the last six months. I’ve been asked to do it countless times over my career, and to do it now is absolutely fantastic.

Scott King
Yeah, that’s interesting. When you mentioned logos, it reminded me of web content where you have to include alt text for images. One, for accessibility, but also so search engines know what the image is. It doesn’t know the image itself, but it can read the text underneath it.

John Michelsen
Hmm, hmm.

Scott King
That’s interesting. So when the other team was 62% accurate with people reviewing the documents, and Krista was over 80%, how were they measuring the accuracy of both? Were they using more invoices and test data, or were they using Krista to measure that? How were they doing it?

John Michelsen
We used the source document. People took the original output from the team that was doing the work and checked it, realizing it was 62% accurate. Then we did the exact same thing with Krista—ran the invoices through Krista’s processing and spot-checked them. We weren’t going through all 10,000 invoices. Once they got those results, they said, “My gosh, let’s go back and do a whole bunch more.” I sent them a link to Statistics for Dummies instead.

Scott King
I’m sure they appreciated that. It’s fascinating how often people don’t believe the data they’re seeing. It happens a lot. Now, for the last use case, we kind of lumped it into natural language processing (NLP). What makes that different?

John Michelsen
Right.

Scott King
I’m interested in your answer because I know you’re going to say it’s just one point in an entire workflow, right? That’s the point, but people don’t always see it that way. They think if they just use a generative AI or some type of LLM to summarize a document, their job is done, but then they end up doing more work. Is that what you’re talking about?

John Michelsen
Right. You’re hitting on something we hear all the time. People say, “If these LLMs can do X, Y, and Z, then I can do this or that with it.” In fact, they’ll say, “I did this, I did that.” I tell them, “You just said ‘I’ five times. Do you realize you’re doing most of the work?” Yes, the model did one specific task, and that’s great. But that’s not an orchestrated outcome running at machine speed, accomplishing everything both you and the model were doing. That’s what your competitor will do to beat you in the market.

You can’t just rely on AI to do one thing a little faster or better—that’s incremental improvement, not transformation. You should aim for full exploitation of the AI’s capabilities. Once that’s done, you can refine things with small improvements. But that shouldn’t be the strategy from the start. There’s a real challenge here.

Now, regarding that third area—NLP—it’s about full orchestration. You might have a random question in a chatbot for use case one, or you’re reading invoices for use case two. For use case three, think of something like inbound emails, where you need to extract a shipping address but not mistake it for a mailing address or an address in the signature. You might be looking for specific information, and there’s a whole area of NLP related to entity extraction, understanding text, and comparing messages to see if they’re similar enough to be treated the same.

For example, if someone asks, “Do you blah, blah, blah?” and you have content that aligns with your policy, you want to match it up. But if it doesn’t, you want to explain the difference. There are many different tools needed to fully automate and orchestrate these processes. Instead of saying, “I did this,” you want AI to handle every part it can, and orchestrating that process should be someone else’s job. That’s the market we’re in, and it’s why we have this perspective.

Chris Kraus
Yep. So John, you’ve described that there are different techniques out there. I think a lot of people think, “I’ll just use a copilot, or my developers will grab some open-source tools, maybe something from Hugging Face, and one tool will solve everything.” But our experience is that there are many ways to search for things—similarity searches, semantic searches, fuzzy logic, and more. People don’t realize that doing this is actually complex and requires multiple approaches. Could you explain why it’s important to use multiple methods for natural language processing?

John Michelsen
Right. AI is software, but it’s a whole category with an enormous variety of types, styles, and approaches. The world is overwhelmed with discussions about generative AI models, and for good reason. We’re taking great advantage of those, and I’m thrilled with the progress. As an old NLP guy, this is like Nirvana for me—finally, we can achieve what we’ve been working toward for so long. But it’s odd how generative AI has overshadowed discussions about predicting dates, numbers, classifying activities, identifying objects, anomaly detection, and countless other valuable business applications.

The focus has shifted too much to generative AI, and while that’s exciting, we’ve got other valuable AI techniques that can help automate outcomes and enable businesses to operate at machine speed. Right now, we’re running around with the generative AI hammer, trying to solve everything with it, but it’s not always the best tool for the job. Even within generative AI, there’s a tendency to overuse semantic search.

You mentioned different search methods, and in content searching, there’s an overemphasis on semantic search. Most semantic searches take a single piece of text and turn it into 1,536 double floating-point numbers.

Chris Kraus
Yeah.

John Michelsen
Then, you have to do a lot of math on that for every token or phrase in your content to make comparisons. No wonder GPU prices have exploded—there’s a lot of pressure on the GPU market. Semantic search is valuable, but you need two or three other approaches to complement it. Let me give you an example: If someone asks a very precise question like, “What is Chris Kraus’s birth date?” you don’t want a semantic analysis of a large body of content, especially if there’s a table with names and a column titled “DOB” or “Birthdate.” You want a lexical search on that precise content.

You need to de-emphasize other semantically relevant data, like parties or holidays, because that’s just noise. You need a different type of search for precise content. And you don’t need to prompt an LLM for that answer. As we’ve said multiple times on this podcast, we use LLMs to deliver answers—we don’t ask them for the answers.

Our approach includes multiple techniques baked in, and our customers benefit from that without needing to become NLP experts themselves. The simple idea of grabbing LangChain, a Python developer, and some React guy sounds easy, but it’s actually like assembling a team of half a dozen people with a bunch of tools and software. Even for a simplistic use case, it becomes laborious, and that approach quickly falls apart when dealing with complex or multimodal documents like images and videos.

Also, not all questions are just surrogates for a Google search. Two-thirds of the time, you need context about the person asking the question, which usually requires access to live systems. And then there’s the final third, where the person isn’t just asking for information—they want something to happen. For example, if I’m asking HR about vacation, I don’t want them to tell me the vacation policy—I’m trying to request time off.

If you’ve built something that only gives content that’s semantically close, you’re not actually answering the question.

Chris Kraus
Yep. One thing we’ve learned from customers is they want to start small and expand, which is great. But there’s also fear around how security and privacy work. If I’m giving data, it could be something simple, like an HR policy that every employee has access to. But if it’s birth dates, hire dates, salaries, or review dates, that data is very specific—only managers or HR would see it. Can you talk about how we manage that? We don’t want to make it free data for everyone, but we also don’t want to restrict it so much that people can’t access what they need. How do we find that balance?

John Michelsen
It’s tricky because you want to envision just dumping all your content into an AI tool or pumping it into a single session. But context windows aren’t that large, and even at a few thousand tokens, things start to get forgotten. You have to think, “This is enterprise data.” So you need to apply enterprise data thinking. You should tag the content by role or attribute. For example, maybe by location—Canadian employees might have different policies than U.S. employees, and you don’t want U.S. employees seeing the Canadian policies. That could create a nightmare.

Scott King
Yeah, you don’t want U.S. employees reading the Europe vacation schedule.

John Michelsen
Exactly. We’re being cheeky here, but that’s a real issue. It might not be a major enterprise data management problem, but it can definitely confuse an AI model. If you have two documents with different vacation dates for the U.S. and Europe, the model might give you no answer or an over-answered response like, “It depends on where you live.” By the time you’re at paragraph three, you were hoping for a simple yes or no, like, “Do we get Memorial Day off?” But you get three paragraphs of explanation instead.

If you use the right techniques, though, you don’t even consider irrelevant content in the search—documents tagged for Europe shouldn’t be considered if you’re not in Europe. Or if you’re not an executive, you won’t see the salary spreadsheet. By narrowing the search to relevant content, you get a much more consistent response from the AI. When the AI has a good set of consistent content, it delivers a better answer.

But you also need to think about coverage and comprehensiveness. For instance, if you ask, “What is Chris Kraus’s birth date?” and there’s a table with that information, that’s easy—that’s 100% confidence. But if you ask a broad question and only have some related content, you won’t get a high confidence score, and Krista will tell you, “I don’t have enough information to answer this fully.” In that case, the system can say, “I’ll get additional feedback and come back with an answer.”

Every one of our customers has some form of this because no one’s data is perfect. Everyone has to continuously curate their content. Your perspective on topics today is different from 24 months ago, and it will change 24 months from now. So constant curation is required, and that’s something we’ve been doing for a long time—since before 2023, when everyone suddenly became interested in these use cases. We’ve been doing it for years, and it continues to mature.

Scott King
All right, sounds good, John, thanks. I can think of a hundred different examples where people are doing all this extra work, thinking it’s automated, but it’s not. We explored the different ways organizations can ask and answer questions from their data and documents. Clearly, you can’t just put employees in front of a tool like Gemini or ChatGPT and hope it works. There are too many privacy and security concerns.
We’ll go into greater detail in our next episode, where we explore unknown questions. We’ll cover security requirements, privacy concerns, prompt engineering, and everything you need to provide an interface for employees to find real answers from both structured and unstructured data, whether it’s in documents, systems, or wherever. I hope you all join us next time. John, thanks so much for your time. Chris, thank you for co-hosting this episode. Until next time.

Now Available: The 2024 AI Buyer's Guide

Close Bitnami banner
Bitnami
Close Bitnami banner
Bitnami