Why Companies Struggle to Capture Value from GenAI
Generative AI is everywhere, from boardrooms to headlines. Yet the deployment reality is sobering. A recent MIT report, The GenAI Divide, State of AI in Business 2025, found that only 5 percent of pilots deliver measurable ROI. The remaining 95 percent fail to scale or make a dent in profit and loss. The report calls this the “GenAI Divide,” is leaving many leaders wondering how to capture value without falling into the majority that struggle.1
For companies, the challenge is even more acute. Budgets are tighter, resources are leaner, and executives cannot afford to waste time on experiments that do not translate into business outcomes. The good news is that mid-sized firms also have an advantage. They are nimble, less encumbered by politics, and can move faster than global enterprises. The question is how to use that agility to cross the divide.
The Issues the Report Surfaces
The MIT findings highlight a simple but critical point: generative AI may boost personal productivity, but personal use does not translate into enterprise transformation. Tools like ChatGPT and Copilot feel powerful at an individual level, but they fail to impact complex business workflows. The scope is too narrow, and the tools cannot scale across departments or processes.
The report also notes that pilots often collapse under the weight of enterprise requirements. A chatbot that feels novel in a contained setting breaks down when asked to integrate with enterprise systems, handle sensitive data, or adapt to edge cases. It is no surprise that only 5 percent of pilots succeed.
Another key signal is the danger of building your own AI platform. Internal platforms succeed only 33 percent of the time, while externally sourced solutions succeed twice as much, 66 percent of the time. The reason is straightforward. Vendors serving hundreds of customers can invest more heavily in development, integration, and security. A single company cannot match that scale of expertise.
Why Most AI Efforts Fail
The first and most obvious reason is memory. Generative AI tools are not designed to retain enterprise context. Each interaction starts from scratch, requiring prompts to restate information and wasting valuable time. Without the ability to learn from prior interactions, these tools cannot sustain enterprise-wide workflows.
This limitation becomes clear when looking at the rise of shadow IT. Employees are already using personal accounts on ChatGPT, Claude, and Copilot to make their jobs easier. Many find them helpful for drafting emails, summarizing notes, or generating ideas. But these services quickly reveal their limitations. Because they do not remember past interactions, employees spend time re-entering context with large, repetitive prompts. What feels powerful at a personal level does not scale when applied to enterprise use cases that require continuity and consistency.
Beyond memory, most AI deployments lack what we call “enterprise understanding.” A useful system must connect across departments, documents, and applications. A sales record in the CRM, a policy in SharePoint, and a customer support transcript in Zendesk are not valuable in isolation. AI must be able to read across those silos and deliver a unified answer in the context of a workflow. Generative AI alone is not enough.
Complexity compounds the problem. Technology is rarely the limiting factor. The true barriers are integration, governance, and adoption. Executives are not failing because their model has too few tokens. They fail because the system cannot connect into the workflows where employees already work, because security officers will not sign off, or because staff abandon tools that require unnatural behavior.
Finally, many companies fail because they start in the wrong place. Too often pilots are “science projects.” They are interesting but disconnected from real business priorities. When a pilot is not tied to revenue, cost savings, or compliance, it will be the first to die when budgets tighten.
What It Takes to Succeed with AI
Crossing the GenAI Divide requires reframing how companies think about AI. Generative AI may get the headlines, but it is only a small part of the solution. Other approaches, such as classifiers and predictors, are critical for scaling. Classifiers can be retrained, audited, and trusted. They can adapt to new inputs and provide clear indicators of accuracy. When combined with generative models, they deliver reliable outcomes that enterprises can depend on.
Equally important is continuous learning. AI that cannot evolve with use is not an enterprise solution. To be effective, a system must retain knowledge from past interactions, learn from subject matter experts, and improve accuracy over time. Without that capability, memory gaps remain and scaling fails.
Leaders must also focus on outcomes rather than demonstrations. Too many pilots are designed to impress in a demo but never connect to measurable business value. Real progress comes from targeting workflows that affect revenue, cost, or compliance. The right pilot should be a step toward measurable ROI, not a science experiment.
Finally, companies need to avoid projects that are interesting but meaningless. A pilot that does not connect to the organization’s most pressing problems wastes time and resources. Even if it works, it will never scale. Success begins with choosing projects that matter to the business and have a clear path to impact.
Where Companies Find Success
While the MIT report highlights widespread struggles, it also shows where organizations are gaining traction. Mid-market companies are often more successful than global enterprises because they can move faster and avoid political gridlock. They tend to pilot smaller, more operational use cases that are easier to prove and scale.
Back-office automation is one of the most productive areas to start. While many firms direct AI budgets toward marketing or customer-facing projects, the highest returns often come from operational workflows. Compliance, finance, testing, and document processing may not generate headlines, but they generate real savings and measurable impact.
Krista customer Doc Prep 911 illustrates this clearly. The company manages thousands of real estate title commitments and lien documents each month. Manual processing slowed turnaround times and introduced unnecessary risk. By implementing Krista, Doc Prep 911 automated intake from email, document understanding, template generation, and uploads to Zoho and SharePoint. A human-in-the-loop process handled exceptions when confidence was low, ensuring accuracy while scaling efficiency.
The results were transformative. Doc Prep 911 now operates with up to 95% less labor for key processes, achieves an 80% reduction in overall costs, and processes work 80% faster. These gains enabled the company to cut prices by 50%, creating a significant competitive advantage. More importantly, staff were redeployed from manual tasks to higher-value activities such as increasing sales, capturing market share, and developing new services. The outcome is more net income at a lower customer price, fueling a cycle of sustainable growth.
This operational focus demonstrates how mid-market firms can use AI to deliver value quickly while building momentum for broader transformation.
The Role of Agentic Platforms
The MIT report points toward the future of AI as an “agentic web,” where systems work together intelligently to deliver outcomes across workflows. Traditional generative AI tools cannot meet this requirement on their own. They lack integration, memory, and governance. What organizations need is an agentic platform that combines generative AI with other forms of intelligence and orchestrates them across the enterprise.
An agentic platform is different from a chatbot or point solution. It manages workflows end to end, coordinating people, systems, and AI to achieve outcomes. It connects securely to enterprise systems, enforces governance and compliance, and scales across functions. It supports human-in-the-loop interaction so that low-confidence cases escalate to the right person with context intact. Most importantly, it improves with use, learning continuously from enterprise knowledge and human feedback.
Generative AI is powerful, but it represents only about 5 percent of the total solution. The other 95 percent comes from integration, orchestration, and continuous learning. Agentic platforms close that gap, MIT’s report mentions, by delivering not just conversations, but outcomes.
Actions Leaders Should Take
Executives cannot afford to wait on AI strategy. The winners will be those who act decisively now. That begins with choosing the right projects. Start with workflows that clearly connect to ROI, such as compliance reporting, document preparation, or back-office operations.
Avoid the temptation to build platforms internally. Externally sourced solutions succeed twice as often because they are battle-tested across hundreds of customers. Focus your resources on the workflows that differentiate your business, not on reinventing infrastructure.
Empower business leaders to own adoption. The people closest to the work know where inefficiencies exist and what improvements matter. Give them the tools and support to lead pilots that prove measurable value.
Keep pilots tightly scoped, with outcomes that can be measured within a quarter. Quick wins build momentum and help executives secure buy-in for broader deployment. At the same time, demand that your vendor’s platform supports continuous learning and enterprise integration so you can build on each success rather than starting over.
Finally, acknowledge that employees are already using AI tools on their own. Rather than fighting shadow IT, provide governed solutions that channel that energy into secure, enterprise-ready workflows.
How You Can Lead with AI
The GenAI Divide is real. Ninety-five percent of pilots fail, but the five percent that succeed show a clear path forward. Mid-market leaders have the agility to move faster than global enterprises, but success depends on choosing the right strategy. Generative AI on its own will not deliver transformation. Companies must adopt platforms that integrate across systems, retain memory, and learn continuously.
The organizations that focus on ROI-driven use cases, adopt agentic platforms, and align AI projects to business priorities will lead in the years ahead. The rest will be left experimenting while competitors pull away. For mid-market executives, the time to act is now.
Links and Resources
- The GenAI Divide, The State of AI in 2025, MIT Project NANDA
- Generative AI is Only 5% of the Solution, Krista Software
- What is an Agentic Platform and its Essential Capabilities, Krista Software
- Doc Prep 911 Automates Document Preparation with Krista, Krista Software
Speakers
Transcription
Scott King
Hey guys, thanks for joining this episode of the Union Podcast. I’ve got my usual cast of characters, John and Chris. How are you guys?
Chris Kraus
How’s it going, Scott?
Scott King
It’s going well. I’m excited to talk about the Gen-AI Divide. We found this MIT report recently—if you’ve seen it online, give it a read. We wanted to provide some feedback because it validates a lot of what we’ve been saying. I’ll put a link in the post so you can grab the report.
The report is about the state of AI in business in 2025. MIT surveyed 350 enterprise executives from across industries and asked a standard set of questions, which are in the back of the report. We wanted to share our view on how people are trying to use AI, the problems they’re having, where the ROI is, and where the opportunities exist.
One of the big themes I noticed was around memory. In the report, they interviewed people using public Gen-AI services like Gemini, ChatGPT, and Copilot. People are having success with them, but they pointed out a major limitation—these tools don’t remember what happened yesterday. There’s a huge opportunity if services could retain and build on past interactions. Did you get the same takeaway, Chris?
Chris Kraus
Yes. I thought it was interesting because they first talked about memory—not enough memory. Usually, articles like this are strictly technical. People get into details about token limits and buffer sizes, comparing Gemini, Claude, and OpenAI. But that misses the point of what businesses actually need.
This report took a different angle—what problems is the business trying to solve? They don’t care about tokens and buffers. For example, if I’m using Copilot or Claude directly, I’m giving them information and then asking a question about it. What businesses want is enterprise understanding—what we call document understanding.
But what’s actually happening is people take one row of data from their ERP—like information about a customer or opportunity—and then prompt the tool: “Write me an email about this” or “Answer this question.” They’re forcing themselves to find the right information and feed it into the buffer.
Of course, there’s also the fear: is the LLM training on my data? Is it learning company secrets? That’s where shadow IT concerns come in. But the positive is they’re starting to align with what we’d call document understanding.
You don’t want memory to just be about technical limits. You want it to cover all the knowledge—meeting notes, sales call summaries, pricing sheets, feature documents. When you ask a question, the responsibility of the software—the layer wrapping the LLM—is to find the answers across multiple sources and return them.
We’ve seen this work well for things like HR documents, because they don’t change often. But at the enterprise level, you’ve always got another sales meeting and constantly evolving strategies.
You need to continuously provide more information so memory includes what happened today, yesterday, and the last three weeks across the enterprise. John?
John Michelsen
You’re laying out an interesting framework. Personal use of a Gen-AI tool on a website is small in scope and tends to be successful. But at the enterprise level, it requires document understanding and what we call Krista Enterprise Understanding. That’s what it takes to make AI memory work at scale.
At a personal level, it often feels successful. At an enterprise level, it usually fails because a single prompt can’t accomplish the same thing across such a broad scope. That may explain the disconnect: why people are nearly 100% satisfied using ChatGPT personally, yet only 5% of enterprise projects succeed.
The gap comes from scope and scale. The technology, as currently implemented, isn’t sufficient. Enterprises are left with wrappers and experiments, which the report references heavily. That contrast—why it feels so good personally but fails organizationally—is one of the most important debates right now.
Chris Kraus
Exactly. Through our experience with customers, we’ve found they don’t have all the answers or all the documents. That’s why we deploy a human-in-the-loop. If a question comes back with low confidence, it’s sent to a subject matter expert. The expert can decide: yes, this belongs in enterprise knowledge, or no, this is too specific, like a sensitive HR issue.
Enterprise memory grows not only from data stored in systems like SharePoint, but also from experts contributing directly. I liked that this report took a business view: what does memory mean for business? What does it mean to get an answer? Not just the technical debates about tokens and buffer sizes.
Scott King
Personal use also varies. Some people are better at prompting or using tools than others, but that doesn’t translate across a workflow. Individual success doesn’t scale consistently.
The report highlighted this. People said, “This is great, but everything else is hard.” Documents, integrations—that’s where the real difficulty lies, exactly as you mentioned, Chris.
Chris Kraus
For sure.
John Michelsen
There’s a lot to it. Many internally built platforms are just a collection of tools wrapped with a Gen-AI layer. The report even calls this out. If you don’t elevate your level of workflow integration and scale in meaningful ways, you won’t improve your pace. Adding AI to a process that’s already broken or slow doesn’t make it faster.
The report gave solid advice: you must think differently. These comments came from executives deploying AI inside their organizations—not vendors. Most of the feedback challenges vendors, which is encouraging for us. But many vendors would try to dodge those points.
One of the strongest findings was that internally built platforms fail twice as often as those sourced externally. If you think you can build your own AI platform and then deliver projects on top of it, your success rate is cut in half. That makes sense. Without the scale of hundreds or thousands of customers sharing the investment and work, you’re dividing the cost by one—just your company. You can never invest at the same level as a third party.
This applies across technology. You don’t build your own database—you use Mongo, Oracle, or Postgres. You don’t build middleware—you use RESTful services. Don’t build your own AI platform.
Leverage one, use one, and make it aware of your business knowledge. That becomes an accelerator. The one-off approach—every system with its own tech stack—only makes things harder. A platform for AI is valuable. Trying to cobble one together from internal and external parts is not. We’ve seen that approach fail for decades. This is another version of the same mistake.
Chris Kraus
I also liked their section on shutting down shadow IT. That’s the wrong approach. It happens because companies put the responsibility on technical teams who don’t understand the business. The better path is lowering the technical bar and giving business users a platform. Then, IT can connect to email, SharePoint, and other systems easily.
By definition—using Gartner’s definition—shadow IT are the business users. They know what they need. You should enable and empower them, not shut them down.
John Michelsen
Exactly.
Scott King
Ninety percent of the people in the report said they use Gen-AI services. I guarantee IT didn’t provision those.
John Michelsen
There’s an interesting dichotomy with shadow IT. On one hand, we’re opposed to the idea of non-IT people directly connecting to enterprise systems. They may not understand InfoSec requirements or enterprise data access principles, and suddenly they’re about to connect to the CRM and make changes. That’s a bad idea.
But shadow IT as self-service capability provided by IT—that’s powerful. That’s really what Chris was describing. Giving business users safe, sanctioned access is a fantastic model.
We hope everyone reads the report because it highlights clear themes we’ve discussed often. It’s not just validation for Krista and our approach. It’s an early indication of what it takes to succeed with AI.
One of the most important themes is that AI doesn’t get smarter on its own. It doesn’t learn simply by being used. I don’t want to criticize other vendors, but most wrap Gen-AI platforms don’t have intrinsic learning capability. They rely on crude mechanisms that won’t scale. At a small level, they may appear to work. But as soon as you try to scale, they fail.
We have to remember AI wasn’t invented in 2023. There isn’t just one style of AI. For example, you can prompt a Gen-AI model to classify emails into categories, but it won’t be very accurate and it won’t improve with corrections. Once you try to classify hundreds of categories or thousands of emails per day, it becomes expensive, slow, and unreliable.
Classifier technology, on the other hand, has been around for decades and has advanced alongside Gen-AI with GPU improvements. Classifiers can retrain, repair their models, and even report where they are likely to be less accurate. Gen-AI models can’t do that.
Nearly every use case where we see real ROI involves classifier technology. It can be as simple as yes/no or multi-category classification—like the email example I gave. Yet many organizations approach AI with only one hammer: Gen-AI. They apply it to everything they can’t handle with standard code, and it doesn’t work.
When these projects are demoed—whether by vendors or internal IT groups—they often show a small proof of concept that works sometimes, then claim it will “learn.” But with Gen-AI alone, it won’t.
The next generation of models—GPT-6 or others—won’t suddenly become smarter about you. They may excel at exams, but they won’t understand your business jargon, policies, or customer sensitivities. That knowledge has to be learned. That’s why, as Chris said earlier, AI memory must be an enterprise-wide understanding of your organization.
A siloed strategy—giving one tool one set of data, another tool a different set, and relying on 16 different vendors—doesn’t expand AI memory. It fragments it, which is a major challenge. Memory, workflow integration, and the partnership between IT and business teams all have to come together.
When you embrace all of AI’s capabilities, you see why we’re so excited about the report. It highlights clear symptoms of going the wrong way—and by contrast, the opposite becomes the right path forward.
Scott King
In the second half of the report, where they discuss workflow integration barriers, it directly calls out what you’re describing: it breaks in edge cases, it can’t be customized to workflows, and it requires too much manual context. You have to craft a big prompt with all the data before you even start, and then it doesn’t learn from the interaction.
So the reaction often becomes, “I’ll just give this to a person to do.” That’s completely opposite of the intent. They’re using the hammer for the wrong problems.
I laugh when I see people online saying, “I tried this math problem in ChatGPT and it got it wrong.” Of course it did—it’s not designed for that.
I also thought the survey results were interesting. Enterprise pilot-to-production success looked very different from mid-market businesses. The smaller, more nimble companies were twice as successful. Probably because they tackled practical, operational use cases rather than sprawling enterprise projects that take forever.
Shorter pilot windows and the ability to try many things led to more success. And mid-market companies aren’t trying to build their own platforms, which helps too. Those two factors together may explain the difference.
I’d love to know exactly who they surveyed. A sales-minded CEO may focus on top-line ROI, while an operationally minded CEO may look at bottom-line impact. That perspective makes a big difference.
John Michelsen
Exactly. The point is you don’t want AI for AI’s sake—you want it for ROI. If a pilot doesn’t move you toward proving ROI, then it’s just a science project. And those are hard to fund beyond a small group, sometimes just part-time staff. Even if the project works, what next? How much utilization do you get? How do you justify maintaining it?
If that’s your proof point for AI in your business, what kind of proof is it? At best, it shows a distraction. You need to be on a clear path where, six months in—even if the timeline slipped a bit—you’re still tracking toward measurable value.
John Michelsen
Chris, you’re not supposed to literally laugh at that—but it is laughable. How many projects have you seen that were actually ahead of schedule? The point is, when it’s six months in and things are tough, you need to remember: we’re transforming the business with this. If we can achieve these milestones, we’ll reach real outcomes.
John Michelsen
But if after six months all you have is a slightly funnier chatbot for your website, what does that accomplish? That’s why the report showed only about 5% of pilots succeed and make it to production. No surprise.
Some of those projects were tied to business value but ran into other issues. But I’d bet many weren’t tied to value at all. We see this even with Krista customers. They’re excited and say, “This is amazing—let’s do something random with it.” And we have to ask: isn’t your business about this? Wouldn’t a significant revenue increase happen if you focused here? Where are your biggest costs? Where’s the bottleneck in your business velocity?
We’ve learned we can’t be prescriptive—we don’t know a customer’s business, and Krista has to learn it. But we do need customers who start with something meaningful. If it works, it should deliver real business outcomes. Otherwise, even if the project “went well,” it wouldn’t move them forward.
That’s also in the MIT paper. One of the myths they call out is this idea that organizations are using Gen-AI to transform. In reality, they’re running science projects—random experiments that, even if successful, don’t transform the business.
The bigger problem isn’t just the money spent—it’s the time wasted. While you’re distracted with random projects, competitors are being more thoughtful and pulling ahead.
That’s what would make me nervous. We’re a software company, and AI is already disrupting how software gets built. Today, we have no manual testers running tests by hand. Our testers have AI execute the entire manual test plan across many screens. By the time they show up in the morning, all the tests are already run—they just review the results. On top of that, we still run fully automated headless tests.
That’s a completely different capability than we had even four months ago.
Chris Kraus
Yeah, right.
The article calls out the value of back office versus website and marketing use cases. That’s us—we focus on the back office. Where are we spending time? Where are our resources? Testing and QA of software is as back office as it gets.
John Michelsen
Totally. And yet it’s the most fundamental part of a software company’s velocity: how fast can you test? Of course, we’re putting AI assistants in our IDEs, making sure developers understand how to use them, and improving quality. That will drive a bump in productivity. But the real constraint is testing speed.
Our DevOps team has great tools, but the real breakthrough comes from saying: “Do this ten times faster than you can today.” That’s how we’ll deliver more software, faster and at higher quality, by fully embracing AI. If we don’t, we’ll be slower than competitors. And in two or three years, if we’re systemically slower, we can’t possibly win.
It’s happening to us too. I reminded one of our senior leaders of the adage: AI isn’t going to take your job—someone using AI will. That’s the reality.
Some functions may need fewer people over time. That’s fine—the world always creates new roles faster than it eliminates old ones. The job handbook has never gotten smaller. So the real question isn’t about lost jobs—it’s whether we’re ready for the new jobs that are about to appear.
As leaders, we must disrupt how we work ourselves. We have to adopt new ways of operating or risk becoming examples of the wrong way. That’s not where anyone wants to be.
Of course, I’m speaking personally about how we operate at Krista. But the point is, we apply internally the same lessons we share externally. And we’re already seeing incredible results.
Scott King
Perfect. I’ll include a link to the report. Thanks, guys, for the review.
I’ll also link to case studies showing real ROI. They’re focused on operational workflows and address everything we discussed today—memory, context, integration, and human-in-the-loop steps. They’re great stories. Appreciate you both joining, and until next time.