Adopting the right technology is crucial for a business’s success. Today, a key challenge is extracting valuable insights from the massive amounts of information buried in documents like invoices, purchase orders, and sales orders. Many companies still turn to intelligent document processing (IDP) and optical character recognition (OCR) to tackle this task. While these methods served their purpose in the past, they now fall short of meeting modern needs. Advances in AI are now allowing businesses to overcome the limitations of traditional approaches and unlock new growth opportunities.
The Problem with Traditional IDP and OCR
IDP and OCR technologies have existed for decades. OCR emerged in the 1960s-1970s, originally designed to read zip codes on envelopes. IDP later evolved to extract data from standard forms and documents, offering a solution for businesses needing to process predictable, structured data.
IDP performs well on standard documents like tax forms where the structure remains constant. Geographic markers and consistent layouts allow these systems to locate and extract data with a reasonable degree of accuracy. However, this approach breaks down when businesses encounter variability. Programming IDP to interpret hundreds of different forms requires time-consuming work from skilled developers. The effort scales poorly when layouts become more complex, such as multi-column formats, tables, or mixed text and images.
Even with predictable layouts, IDP and OCR struggle to understand context. They often fail to differentiate between similar characters—like “I” and “1”—or recognize the roles of different text sections, such as headers versus body content. This lack of deeper understanding limits their usefulness in real-world scenarios.
Why Traditional Approaches Fail in Real Business Scenarios
Many businesses face the challenge of processing documents that contain similar information but come in varying formats. Traditional IDP struggles in these situations because it relies on rigid templates that cannot adapt to diverse document layouts. This becomes especially problematic in industries like healthcare, where different insurance providers use varying formats to convey similar information. Companies working with multiple vendors encounter a similar issue, receiving invoices in different structures, each requiring manual review.
This reliance on manual processing creates significant burdens. Employees spend time cross-referencing information across documents, which leads to fatigue and declining accuracy over time. As motivation drops, error rates increase, causing frustration and inefficiency. In contrast, AI systems continuously learn and improve, becoming more accurate with each iteration. To stay competitive, businesses need a solution that surpasses the limitations of traditional IDP and OCR.
A European healthcare organization illustrates this challenge. They processed 10,000 invoices across seven languages with traditional methods, achieving only 65% accuracy. High employee turnover and declining performance compounded the problem. After implementing an AI-driven solution, they saw a dramatic improvement, reaching 82.5% accuracy within days. This example highlights AI’s potential to transform how companies handle unstructured content, making processes more efficient and less reliant on manual intervention.
AI-Powered Document Understanding: A New Approach
AI, combined with real-time data, offers a superior solution for document processing. Businesses no longer need to categorize each document manually or build complex templates. By treating all documents as unstructured data, AI can interpret content flexibly, adapting to different formats without the need for rigid mapping.
This versatility allows AI to analyze a wide range of content, from structured forms to free-form text. For example, AI can extract information from complex Excel spreadsheets with merged cells or identify key details within Word forms containing checkboxes. It reads beyond the surface, understanding the relationships between data points and recognizing context within the documents.
This adaptive approach is especially valuable when processing documents with changing details, like sales orders. While the core questions—product ID, quantity, shipping address—remain consistent, the answers vary with each order. AI adapts to these variations, accessing real-time data to ensure both accuracy and speed. It understands what businesses need to extract from each document, even as formats evolve, making it an ideal solution for modern document processing challenges.
Steps to Implement AI-Driven Solutions
Implementing AI for document processing is an ideal starting point to accelerate workflows and demonstrate the value of AI within your organization. It’s easy to get started, and it will build trust in AI’s capabilities. Here are the key steps to follow:
- Identify Bottlenecks: Assess areas where manual data entry or document review slows your operations. Look at processes like accounts payable, where employees handle various invoices, each with a different layout.
- Focus on Quick Wins: Prioritize tasks where AI can make an immediate impact. Automating invoice processing can yield quick gains and build momentum for broader adoption.
- Automate Repetitive Tasks: Use AI to extract standard information—like purchase order numbers, costs, or shipping details—across different documents. Free your employees to focus on higher-value tasks, like managing vendor relationships or resolving exceptions.
- Adopt a Learning Mindset: AI implementation is not a one-time fix. It’s an evolving process where the system learns and improves over time. Businesses must adapt, refine, and continuously enhance their approach to stay ahead.
- Prepare for Long-term Gains: By starting now, you position yourself to scale more efficiently and build a significant lead over competitors. AI transforms your ability to handle more documents without adding headcount, driving growth without sacrificing quality.
Seize the Opportunity with AI
Document understanding has come a long way. Five years ago, businesses couldn’t solve these problems with the technology available. But today, AI-powered solutions offer a clear path forward. Faster processors, improved AI models, and real-time data capabilities make it possible to handle unstructured documents efficiently.
Krista enables businesses to automate tasks that traditional IDP couldn’t manage, allowing companies to do more with the same resources. This isn’t just about cost savings; it’s about using your talent where it matters most and staying competitive in an evolving market. Take a hard look at your document processing workflows, embrace AI, and unlock a new level of efficiency and insight. Now is the time to transform how you do business—before your competitors catch up.
Links and Resources
Speakers
Transcription
Scott King
Hey everyone, thanks for joining this episode of the Union Podcast. I’m Scott King, joined by my usual co-host, Chris Krauss. Hi Chris, how are you?
Chris Kraus
Hey Scott.
Scott King
And we have our usual special guest, Chief Geek at Krista Software, John Michelson. Hi, John.
John Michelsen
Hi, title well deserved.
Scott King
Yeah, well deserved. Chief Geek—you might be the only one. Anyway, today we’re continuing our document understanding series. We provided an overview, and we talked about unknown questions—questions you think a chatbot or some type of AI virtual assistant would answer.
Today we’re talking about known questions but unknown answers. These are questions you routinely ask your vendors or customers, and they submit documents with the answers in different formats. It’s not a standard format. We have some examples to discuss. Think about checks, tax forms, things like that. Chris, explain the OCR technology, how it started, how it evolved, where IDP is today, and why it works. Some of the advancements are very recent, and people may not be aware they exist.
Chris Kraus
Yeah, Scott, you’re right. Many people don’t realize this problem has a new solution. Traditionally, OCR came out years ago in the sixties and seventies. It started with reading zip codes when sending mail. It was a short number, handwritten, and didn’t require much processing, so even the mainframes of the time could handle it.
People have implemented IDP for years, handling things like tax forms. Millions of people fill out the same form, and systems use geographic location and bounding boxes to process them. The challenge comes with handling lots of different types of forms. Organizations often automate the top five forms they see but not all 100. People think that’s how the problem is solved—part of it is manual, while the rest is handled by OCR and IDP when the forms are well-structured.
The issue arises with complex forms—tables within tables, merged headers, and such. OCR and IDP couldn’t handle everything, so people saw it as useful for some tasks but not others. We needed new approaches. OCR doesn’t struggle with differentiating between ones, Ls, and Is anymore because we can use spell checks to resolve those issues. Newer technologies have emerged that address these different problems.
For example, businesses receive many invoices with the same information in different formats. That’s where IDP struggled; it excelled with standardized documents but faced challenges with dynamic ones. You’re still asking the same questions—like name, social security number, address—but the document structures vary. John, when we talk to customers, we hear from HR professionals who say, “We know that Aetna, Blue Cross Blue Shield, and all these insurance plans have the same information on the page, but they’re structured differently.”
They might all have details like a deductible for a single person or a family, but they look different. Traditional IDP shied away from those variations. Even though they’re adding AI capabilities now, it’s a big shift in tech for them. What are the kinds of problems customers want to solve where we need a different approach?
John Michelsen
That’s right. Yeah, Chris, you laid it out well. IDP is intelligent document processing, but let’s just say it was semi-intelligent. The intelligence mostly came from a person looking at a scanned document, drawing boxes, and identifying elements like the account number or date. More often than customers wanted, those people had to correct the systems. We’ve heard it’s common for these systems to process documents straight through about 30% to 50% of the time, with a high watermark around 60%. And that’s with stable document content, where you can predict the structure is consistent.
So, how do you improve that percentage? Obviously, being in the 40%, 50%, or 60% range isn’t ideal. It might be a starting point, but it’s not the goal. The other question is, what do we do with the types of documents you just described, Chris? Document understanding, as Scott outlined, is when you know the questions you want to ask, but you don’t know where in the document or which document holds the answer, or even the shape or type of document. You assume the answer is in there, but the challenge is extracting it from less structured content.
The techniques we can apply today, using multiple styles of AI combined, produce phenomenal results. I’ve been an NLP enthusiast for a long time, and I’ve never seen the kind of projects we’ve delivered in the last few weeks. It’s thrilling because it’s gone from questions like, “Can you handle my inbound order management problem?” or “Can you manage my inbound invoicing?” It’s usually about inbound issues because you typically don’t send out unstructured content. You might establish a few formats and stick to those.
But when you’re dealing with external parties sending content to you, you can’t always expect them to structure it the way you want. That’s the other side of the challenge. In most departments, you might need a few people to handle accounts receivable outbound, but you’ll need many more to manage accounts payable inbound. IDP never really tackled those kinds of projects effectively, but today we can.
Scott King
So John, you mentioned multiple styles of AI earlier. Can you explain which styles you’re referring to?
John Michelsen
We’re in a time when many people think of AI as generative AI. Generative AI is a game changer and does incredible things, but I often remind people that there are many styles of AI. I’m using “styles” to avoid getting too technical, but we’re talking about classifying activities, classifying objects, predicting numbers and dates, identifying anomalies, and dozens of other AI capabilities that aren’t about generating prose. Some natural language capabilities might be better suited outside of large language models due to performance costs or accuracy. When you combine these abilities, you get more than the sum of the parts.
A lot of people focus on using large language models for everything, but that’s not the right approach. There’s a great toolbox of AI capabilities, and while LLMs are impressive, they don’t replace everything that came before them or the advancements happening now in other areas.
Scott King
I always find it interesting when people say LLMs aren’t good at math. It’s a different kind of AI. You talked about the inbound process—receiving invoices, extracting information, and putting it into other systems. You mentioned 60% accuracy at best. That’s good, but there’s room for improvement. What’s the business impact of this? If companies have always handled it as a cost, what changes with improvement? Do you have an example of a project that makes it more concrete?
John Michelsen
Sure. One of my favorites was a recent proof of concept. Our customer wasn’t ready to cut over completely and have the 22 people working on this process stop immediately. This organization in Europe handles thousands of inbound invoices monthly, and the challenge is managing seven different languages and varying GST requirements. During our discovery, we found that some vendors include a logo on invoices without printing their company name, using the logo as identification. It makes sense for a person, but if the goal was to make the document machine-readable, they wouldn’t have done that.
The project was simple. We ran it for a month alongside the existing process to see if Krista’s AI techniques could match what the current team was doing. Even at 40% accuracy, there’s significant ROI because if the machine handles 40% of the work, that’s already 40% in savings. And it’s more valuable than just the labor cost savings. The project showed a theme I’ve emphasized many times.
Our initial accuracy was 82.5% on extracting answers from about 10,000 invoices, where we knew the questions. The people doing the same activity were 65% accurate. That’s concerning if you’re one of the vendors because they’re only 65% accurate at getting information into the payable system. And it’s more complicated than just identifying a number and entering it. It involves business rules for each vendor and their variations. For example, I might buy certain products from a company but not others, or I have a contractual price that they didn’t honor. These details make the business process more complex.
The results were phenomenal, especially compared to the manual process. It’s no fault of the workers, but humans struggle with repetitive, tedious tasks. This leads to high turnover—typically 75% to 125%—in these roles.
Look at a call center, where workers constantly deal with the same issues and face repetitive, complicated situations. It’s exhausting. That’s why we want AI to handle jobs people don’t want to do. I don’t see the need for AI to become a singer because people enjoy that creative work. I value artists and musicians for their unique contributions. I want AI to handle tasks like reading through documents, extracting information, and applying complex business rules. If something doesn’t work, the AI can alert me, and I’ll provide guidance. We need AI to focus on practical tasks, not just creative endeavors.
Scott King
Yeah, you want the machine to handle the tedious work. When people worry that AI will take jobs, it’s really about those roles with high turnover, where it’s tough to find people. AI should take those jobs because it’s better at repetitive tasks. With the generative AI phenomenon, people see creative outputs, like art and music, and don’t realize AI is also doing things like fraud detection. Yesterday, I got a fraud alert on my credit card because I left a 100% tip at a restaurant. It flagged that as unusual.
I’m not a cheapskate at all; the 100% tip was easily identifiable for a machine. A person would just accept the tip, but maybe it’s situations like that where people think LLMs are easy to use and improve through prompts, making it easy to see their potential.
So when you talked about the IDP project, what kind of advances are involved? Clearly, this is new. Why does it work now when it didn’t work years ago?
John Michelsen
One factor is GPU speeds. There were models we couldn’t build or maintain without a complex process involving a lot of human labor. Now, we have Krista building those models, and we don’t need to build custom models by hand for each customer. Every customer has a few AI models tailored to their use case, built and maintained automatically. That change in productivity means we can introduce AI where the compute power, labor cost, and time once made it impossible.
For example, with that healthcare organization in Europe, just a few days before we started the project, we were told they wanted to proceed, but they needed a verification step. Normally, we don’t do verification steps, but I wanted this project to see how we could handle it. A few days of analysis were all it took before executing the project.
When you think about those few days, it involved someone who understood various AI styles and how to orchestrate them during document processing. For instance, starting with a classifier to handle multiple document types, then using entity extraction from NLP that doesn’t necessarily rely on LLMs but on other AI techniques. If the system lacks confidence in what it reads, it uses lookups and verification systems.
We maintain an image database of vendors that don’t include their name on invoices, asking a person to identify the vendor once, then using image recognition to classify that vendor automatically. Our team didn’t just apply one solution; they used a range of tools. We approach it step by step: how to start, what the middle looks like, and how to improve accuracy.
Krista also naturally improves over time. When it misclassifies and a person corrects it, the models get better. Half of the solution requires our creativity, and half is self-improving. If you take anything from our podcasts, it’s that we’re trying to weave human capabilities, system capabilities, and AI capabilities together to create outcomes. I could have saved you two minutes by just saying that—that’s really what we do.
Scott King
Yeah, that’s really cool. The GPU market gets a lot of attention, with new chip makers all competing with Nvidia. It’ll be interesting to see how it evolves in the coming years. Chris, I’ll hand it back to you. John talked a lot about this project and the tech. I’m curious—how does someone recognize that they have this problem? And what are the action steps to address it? How do I know I have this problem if I’m already dealing with it, with people in accounts payable reading invoices, and I want to automate it with tech? What do I do?
Chris Kraus
First, recognize that there’s a solution now that didn’t exist four years ago. Technology has improved, and computers are faster. If you look at your teams and see that people are reading different documents manually, like when you’re procuring and receiving various invoices, shipping advice, or delivery notices, that’s a sign.
There are often multiple steps in the process where someone has to correlate different documents—physical sheets of paper—and match them to pay an invoice. These tend to be paper-based processes, with people manually sorting through filing cabinets. Yes, you could scan them all and put them into software, but that still means going through images in a file folder and correlating them.
Look for situations where you receive many different formats but seek the same information. For example, you’re looking for the purchase order number, the cost, the shipper, or the date of remittance across documents that are likely very paper-based. Those tasks are ideal for automation—having a computer match that data while still involving people for exceptions, like when you receive a delivery without remittance advice or a partial shipment. Humans are still needed to make decisions in those cases, so automation doesn’t eliminate all roles but streamlines paper-heavy processes.
Scott King
Yeah, and if you free up people, they can focus on more important tasks, and they can probably handle more work, right? You’re achieving more with the same team. That’s always beneficial. John, anything else from an action-oriented perspective on this type of IDP?
John Michelsen
Yep, so far. You want to become one of the organizations that has a core competency in adopting technology faster than your competitors. The action is identifying where the velocity of your business could accelerate the most, where you can achieve a win quickly to teach the organization the pattern to recognize and apply elsewhere. How quickly can you declare victory to start that journey? The faster you start, the bigger your lead over others, and vice versa. It’s not about labor cost savings.
I dare say I don’t think Krista has cost a single job. Krista has allowed businesses to do more with the same team. It has elevated people from jobs they might have quit to roles they now enjoy. One example is call center employees who have become Krista authors, transitioning to true knowledge management roles or moving into sales. They are experts in the business, and you don’t want to lose that tribal knowledge. Now, they aren’t stuck doing tedious tasks. Every company has opportunities like this. But it’s really about accelerating revenue now.
Scaling at a lower cost is also faster because we’re all bottlenecked by human capital management—hiring new people and retaining those we have. There’s an old saying that’s stuck with me: the biggest unspecified cost of operating a business is employee churn. It’s baked into many line items in your income statement, but it doesn’t have its own line. If you saw it, it would shock you. It’s one of the top costs of running a business. We need to address this to ensure people stay longer than nine months.
Back to action: take action. This isn’t something where you can wait a couple of years and then buy a ready-made solution. This is a world where your organization needs to learn and transform, and you have to start that process to get better and faster at it.
A great example, even from our own business—AI is now generating code and, for the first time, changing code. Case tools have created code for decades, and I got burned by them before. Code generators aren’t helpful, but code changers are a game changer. This isn’t going to replace developers; it will make individual developers more valuable and increase the demand for more software. It’s going to accelerate innovation, which is what Krista aims to do, making the world better. Developers won’t lose their jobs—they’ll accomplish more with the same effort and become more valuable. It will also increase the expectations for developers to do even more. We won’t see computer science PhDs out of work.
We’re in the middle of an incredible transformation. You can either be proactive about it or try to catch up after it happens.
Scott King
Yeah, I love it. But there’s always inaction, right? People want to wait and see what happens, but the timeframes are getting shorter and shorter. I appreciate the discussion on IDP—I learned a lot. This is new to me too, so thanks for the examples, and we’ll talk to you next time.