GenAI: The Illusion of Simplicity

April 17, 2024

How Hard Can it Be?

Generative AI (GenAI) and Large Language Models (LLMs) are transforming how every business operates. While building custom solutions might seem appealing at first, there’s a hidden reality behind the allure of “free” and “easy” AI tools. These seemingly simple solutions often lead to unforeseen complexities and ongoing costs. I talked with Chris about the hidden pitfalls of assembling or building your own GenAI platform and how it requires a different approach and skills than your previous software development projects. Our conversation summarizes our recent paper, “The Illusion of Simplicity in Building Your Own Generative AI Systems .

The Opportunity Costs of Assembling Your Own AI Platform

Building infrastructure and integrating GenAI with your systems initially seems cost-effective but hides a web of escalating expenses and long-term limitations. Chris and I break down the key areas where costs spiral out of control:

  • The Token Cost Fallacy: Public AI tools use a “per-token” pricing model, promising affordability. However, as your usage scales up (and it will if your AI solution is successful), those small token costs quickly become unmanageable and unpredictable.
  • Massive Hardware Investments: If you choose to escape spiraling usage fees by running your own models, be prepared for massive upfront costs. The specialized servers necessary to power LLMs can easily reach hundreds of thousands of dollars.
  • Expensive and Scarce AI Talent: Building and maintaining your in-house GenAI solution demands a rare breed of engineers who understand LLMs, prompting, data management, and more. These highly skilled individuals are expensive, difficult to find, and easily lured away by competitors.
  • Innovation Lock-In: Tying yourself to a single LLM is a dangerous gamble. What happens if the model you’ve built everything around suddenly declines in quality or becomes obsolete? You’ll be left scrambling to adapt your entire system, wasting precious time and resources.
  • Overlooked Complexities:
      • Finding (and Keeping) AI Talent: The specialized resources you need are in short supply and high demand, making hiring and retention a major challenge.
      • Content Ingestion: Feeding the right data to an LLM involves complex handling of different file types, structures, and the nuance of your company’s information.
      • Data Privacy and Security: Implementing strict role-based access and securing sensitive data is essential and adds yet another layer of complexity.
      • AI Must Do Something: Providing mere answers isn’t enough. To gain value, your GenAI system needs automation capabilities to drive actions and outcomes based on the user’s intent.

Constant Chaos: The Price of In-House Maintenance

Think back to the years spent upgrading web frameworks or migrating databases – now imagine that cycle compressed into weeks, sometimes days. This is the reality of in-house GenAI. LLMs evolve at a breakneck pace, and each update has the potential to break your entire AI-powered solution. Instead of driving innovation, your IT team will be trapped in an endless cycle of updates and debugging, leaving no time for strategic improvements. Alongside the technical chaos, security becomes a nightmare. As new vulnerabilities emerge in these rapidly evolving tools, your team must constantly patch and monitor to protect sensitive data. Every buggy AI output or unexpected downtime erodes user confidence. When your solution is unreliable, its potential will never be realized, leaving your transformation initiative stalled with a frustrated user base.

Real-World Cautionary Tales

The theoretical risks of building your GenAI solution become alarmingly real in practice. One company, excited to unlock its internal knowledge, built its own GPT on top of company policies. Thinking this was cutting-edge, they asked a policy question of the custom GPT. Instead of a clear answer, the AI bombarded them with clarifying questions to understand roles, security, intent, and context. It turns out powerful AI doesn’t automatically equal useful AI.

There’s a Better Way: The Power of an AI Platform

If you thought building your own GenAI platform would give you control and save you money… you’re not alone. But the truth is, it often leads to exploding costs, innovation bottlenecks, and frustrated users. There’s a smarter way. A platform like Krista lets you focus on your AI strategy instead of maintaining the infrastructure to deliver it. Krista enables you to:

  • Focus on Results: Krista helps you define measurable outcomes for your AI initiative. Whether it’s speeding up customer service, streamlining a complex process, or automating rote tasks, we provide the tools to achieve quantifiable improvement.
  • Speed to Value: Krista handles the infrastructure to deliver AI more quickly. The platform’s pre-built integrations, security infrastructure, and user-friendly interfaces enable you to deploy AI solutions in record time.
  • Adaptability: Krista lets you stay ahead of the curve. You can easily swap LLMs, manage model updates, and ingest data from various sources, including images and videos, as the technology matures.
  • Empowers Teams: Krista puts the power in the hands of business users. She enables subject matter experts to shape the AI-powered transformation without needing to learn complex coding or prompting techniques.

Let Krista handle the unpredictable technical complexities of AI so you can focus on the transformative possibilities.

Links and Resources

Speakers

Scott King

Scott King

Chief Marketer @ Krista

Chris Kraus

VP Product @ Krista

Transcription

Scott King: Well, hey, everyone, this is Scott, and that’s Chris. Hi Chris, welcome to this episode of the Union Podcast. We’re going to talk about building your own generative AI system, your own generative AI app, and why that may or may not be the greatest idea for your resources. So, Chris, I want to start off quickly. How is this different? Let’s level-set everyone that already builds apps, right? I have an IT team. I’ve got resources that build and maintain my custom applications or modify my packaged applications. How is this different?

Chris Kraus: Okay, so you’re actually adding a lot of complexity because this is something you’re starting over with. And I think it’s kind of important to let’s think back to when you started in IT. Years ago, we had client-server apps and terminal apps, and then we started saying, “Okay, we’re going to go to web apps and mobile.” So we actually had to learn a whole new set of technologies to implement that.

The cool thing was that IT loves that because we get to learn a lot of new things, and they emerge over time. So we started out writing HTML in the CGI bin. And we realized, “Okay, how do we get scalable? How do we get secure?” We changed the way we wrote. We don’t write HTML anymore. We use frameworks like React, Angular, and JavaScript to render the HTML.

Well, realize those changes happened over years. You probably switched between one of those frameworks because there were new advantages. Like, it would render on mobile or wouldn’t render on mobile, things like that. It was reactive web. You know, you might have started with Xamarin or Ionic and swapped to the other one because you said, “I want this to actually power my mobile app.” No one ever thinks about, “Oh, we did Spring and Struts and all this nonsense to actually persist the data.” But realize the way we build an app today is because it’s years of experience of what worked and what didn’t work. And we found the frameworks that work for us as a company.

Those technologies changed over years. New frameworks, new upgrades, we swapped. And the point is that was done over years, right? I mean, literally 10 or 15 years. But generative AI and LLMs, those things are changing rapidly. I mean, those changes are on steroids, if you will. I mean, we heard about the initial… like there was GPT, OpenAI 3, 3.5, 4… you know, there’s Gemini that just come out, and it’s got advantages of buffer sizes and how it’s handling things. This technology is evolving very, very rapidly. So it’s not like every three years we’re going to get a monumental change in generative AI. It’s a nascent market. It’s a new market. We have never seen shifts in technology and advancements… we’ve talked about that fight of who’s better this month versus next month.

Scott King: Yeah, I mean, it will most likely change inside your project, right? As you mentioned, people will judge the efficacy of the different models, and they leapfrog each other. And you can’t wait around to pick the best one because the best one today is not going to be the best one tomorrow, right?

Chris Kraus: Yes.

Scott King: As I was putting the paper together that we’re reviewing, I counted the major updates for OpenAI’s ChatGPT-4. And it was like 29 updates in 14 months. I mean, imagine the regression for that type of thing. If you did already have a solution that was just answering questions, the answers are going to differ based on those updates, right? The summarizations shrink, the context windows expand, all that type of thing.

But that’s the real use case that people talk to us about is “I just want to ask it questions. I need to serve my employees or customers better with better information.” But that’s just one step. So, if I’m going to use AI to do this, what is the difference between really… just Q&A, and what does it mean if I want to improve a process? What comes out at the end? Because if I ask a question, I’m in the middle of doing something else. That’s not the end result that I’m looking for.

Chris Kraus: Yeah, so what you’re looking at is there’s a big difference in, “I’m just putting a UI in front of chat or chat UI in front of the LLM.” And that’s what ChatGPT did in reality. It said, “I’m just going to create a new interface to ask it questions.” But you have to know how to prompt; you have to know how to stage your data, and you have to know how to manage the context window itself.

And really in reality, what we want is the ability for a user to actually get answers as part of an automated multi-step process. You want to use it as part of an automation to solve a problem. So that means all those things like, who are you? What’s the context of the data? Am I getting the right data? Am I getting hallucinations? All those things actually need to be inside the AI module, not exposed to the user itself.

Scott King: Yeah, imagine trying to teach a user all the prompt engineering techniques. You would never get ahead of that, right?

Chris Kraus: No, they change so often. We were looking at something the other day, and the way you could ask questions and get answers back actually changed from the way it worked the week before because of all those updates you said in the background. So we will have to be much more adaptive in how we interact with that technology, get answers, and help scope it.

Scott King: Yeah. So what is… obviously we have a platform to do this… what’s the advantage of using a platform versus assembling all this together, right? Because it looks easy. I go to a web interface; I ask a question, and I get an answer, right? And even if I use an open-source model, right, which is free, what’s the advantage of using a platform?

Chris Kraus: Yeah, so if we look at this, there’s a lot of tooling around helping you provision the virtual machines, set up a model, make that open-source model available – that just gives you APIs. Databricks is really operationalized. How do you get the APIs available? And there are things in Amazon and Azure to help you spin them up. But at that point, you just have an API. It’s not going to be part of a business process. It’s not solving the problem.

And if it’s generative AI, you’ve got questions back and forth. Maybe if it’s categorizations or predictors, you’ve got learning, structured learning, things like that, and training of the models—all those things have to happen. So you realize a lot of what we figured out through, say, dealing with LLMs is that we actually want retrieval-augmented generation to get the right data and then use that.

So the LLM becomes an editor and says, “Given these facts, give me an answer.” Because a lot of times, the LLMs are going from billion to trillion parameters. They know more things but may not know about your process. And so that whole concept of, “Okay, just because I have a model, I spun it up, I can now use a tool to help me do some prompts…” That’s a long way from actually integrating it into a business process that has the context of your specific data, solves your problem, and comes to an outcome. Like, did I get an answer and update it back in the system? Based on my answer, did I create a six-step workflow to get this done? So, there’s a big difference between raw code and actually the solution.

Scott King: Yeah, I think that is the biggest leap. The people that I talk with the biggest leap is “I have to train a model on my data.” Versus what you’re explaining: no, just retrieve the data, feed that to the model, have the model generate or summarize the data for you. Versus… you can’t teach it, right? Because one, your data is moving every second. And so if you’re training it, which you probably don’t have the resources to do anyway (and that is deceptively expensive, right?) If you’re bringing this in-house, your AWS or your data center or your self-hosting costs are going to be incredible. Because they look cheap, right? Like, what is a token? Tens of…cents, tens of percentages of cents. But then it’s like, “Okay, if I want to do this, how many tokens is that?” Well, it varies every day, right?

Chris Kraus: Yeah, point-zero-zero-zero-three cents or eight cents.

Scott King: Yeah, yeah, you’re just trying to save money. So we had, well, John had, what was it? A LLaMA model on a server in his closet, right? And he just had the smallest server that would basically serve his testing needs, and it was $70,000, right? I’ve read online, and I’ll link to the article where somebody did some AWS cost just for the initial setup, it was like $5,000 a month to host a free LLM in AWS, right? Just all the compute costs. And we took it a step further. We got a quote from a hardware vendor. I said, okay, if I wanted to buy a server and put this into production, what kind of server would I need? And John got a quote, I think it was like $209,000 for one, right?

So you’d need another for redundancy and then you’d need another for pre-prod work, for development and integration. I mean, that doesn’t sound free, right? That sounds pretty expensive, and companies probably didn’t budget that much this year for that type of thing.

Chris Kraus: Well, and realize you need several. You need two in production for backup. Then you need one in development and probably one in test. So, you know, when you’re talking about hundreds of thousands of dollars for a server and you need three or four of them, that’s a big cost, right?

Scott King: Yeah, that’s a million bucks of hardware just right there, just for four of those servers.

Chris Kraus: Yeah. Yeah. And then trust me, it has four power supplies and a million fans.

Scott King: Yeah, yeah. But, you know, even doing that… and we talked about how the generative AI services and the LLMs are leapfrogging each other… you really are at risk of basically locking yourself into a position that is going to be very difficult to get out of, right? Like, what if you bet on one LLM and it just tanked? What would you do?

Chris Kraus: Oh yeah, like we did testing of LLaMA versus Mistral and we’ve done testing of Mistral versus Google Flan-T5, like different open-source ones, and then the actuals. And those, you know, as they say, your mileage will vary. But in reality, your projects, you may have a standard set of questions you need to answer or types of things you need to retrieve. It’s table data, entity data, doing your retrieval augmented generation… that retrieval and many generations stays the same.
But when you handshake with the actual LLM, I mean, obviously, we know that 3.5 is way fast, 4.0 in OpenAI is giving us a better answer, but it takes us like 10 seconds, right? And we can get timeouts because of load. So you’re not going to be able to know which one you’re going to use until maybe three weeks before you go live because you may need to do testing and change. And then, three weeks after you go live, you’re going to reevaluate because we know that Gemini has come out. Maybe that’s the one you want to go to, or there will be another version of OpenAI. There will be more versions of Gemini, right?

So we know that, like, during the development lifecycle, it could change. So you need to make sure that you use them, but you’re not tightly coupled where you can’t swap them. But realize that’s not going to change once you go live… because after you go live and you see how the actual users are asking questions… their questions may not be as well-directed as ours. I think about a question when I ask it because I know what the data looks like. The users are going to be all over the place. Sometimes, you’re not going to know until you’re in production that you need a different model to help with that.

Scott King: Yeah, that’s a good point, right, that we really didn’t prepare for today… the way the user will ask a question. Because you’re right, if you know what the API provides, and you know what the data looks like, like a developer, a product manager, or some business analyst… they’re going to ask a very different question than some other knowledge worker, right? Well, I’m kind of a tweener, right?

Chris Kraus: Yeah, well, like when you take the stuff I write, you always ask the questions differently because you don’t know the data model. But you’re in the middle because like you said, not only is your hair tall and a tweener, you’re a tweener because you know how to prompt… you become an expert in LLM prompting, but you may not know the data underneath it. So I’d be like, is that an employee or is that a team member?

Scott King: Yeah, imagine when all the models can read images better. I mean, no one’s pasting pictures into it right now, but when they do, imagine what’s going to happen. That’s going to be crazy.

Chris Kraus: Oh, that’s multimodal. It’s changing everything. Yeah. Well, there’s an API change. How are you giving the data right now? We’re giving text, so there’s an API change in how we give it an image. What type of image? And then when you get a response back, you’re getting text or you’re getting text with a picture to describe it, right? So that’s all plumbing changes, if you will, because your APIs are changing because we’re dealing with new data types. So you know what? Well, it’s going to change for like pictures and stills and change your videos, and it’s changed your audio snippets. It’s cool, but realize those things need a protective logical layer. We have to decouple our application from those changes very gracefully.

Scott King: So Chris, obviously, people are going to have to do this themselves, right? They have their own knowledge of their own proprietary processes. That’s their advantage or their competitive IP advantage. But what they don’t need to be doing is building the infrastructure for this type of generative AI, like an interactive solution or app or automation. Walk us through some of the things that they may not be considering. We talked about skilled resources, the prompt engineering, the hardware costs, and the token costs… like all of those they’re probably thinking about. But explain why you should build this with a platform like Krista versus assembling this on your own.

Chris Kraus: We now use J2EE app servers; we use C# .NET to get that accelerator because those developers have solved a lot of the problems of like multi-threading, database access, authentication, things like that, state management. Using Krista as a platform is going to do the same thing. I have a hundred developers working on the platform. Some are focused on user interface and conversational design. Some worked on connecting and adapting to systems and having automated workflows. And part of them are focused on how we deal with connecting to multiple LLMs and seamlessly changing them.

So what happens is you have APIs – right now they’re text in, text out. We know how to change the way we prompt to get to that data. But in reality, we’re about to see a big change with multimodal. So people are going to say, “Here are pictures, but I want the images to come back. Here’s an image, and want to understand the data on that image itself.” So that’s all API changes.

Instead of you having a team of developers that never goes away worrying about those things, that’s my team’s problem. We give you the platform that we’re constantly upgrading. So, if you want to swap to Gemini from 3.5 or 4.0 of OpenAI, that you switch. We already know how. We connect to them, we know how to modify prompts and responses that come back because the JSON is always different, right? There’s no standard for that.

And then when those LLMs are ready to accept, say, images, where they can read the words on a picture and help derive something, those things will be there. So our developers are focused on that part where you can focus on the business app. What am I trying to accomplish? That way, you can swap and always be up-to-date. It’s not like, “Oh, I need six months to refactor everything,” which in developer land, refactor means, “I’m going to rewrite it all,” let’s face it. Because I need to change getting to images, right? Or movies, things like that.

Krista handles that stuff the same way you would expect an app platform to handle database threading and session management. It gives you the ability to swap the type of LLM in the background. Right now, we can swap the database behind the J2EE app server. We probably wouldn’t know the difference, right? Because the frameworks handle that. So that’s where we want to be. It’s not that we’re not going to build the apps. We will focus on solving the business problem and interacting with the users, not all the math of connecting those things.

Scott King: Yeah, and you want to be able to do that quickly, right? Because if you know, like you were saying, there are 100 developers working on Krista having those AI skilled resources, it’s a supply and demand problem. There’s just not enough supply for the demand, and it’s going to get more and more expensive. And those developers may come and go, right? As they get better offers, just like the LLMs update. I think you’re just putting yourself in a bad spot, right? You should be working on the strategy instead.

We do explain all of this in a new paper that we’ve written that Chris and I have worked on, “The Illusion of Simplicity.” So please give that a read and ask us questions on how you automate this or how I would apply AI to this. There are a lot of cool examples in the paper. Chris, anything that you want to close with?

Chris Kraus: I’m excited! If you want to comment about this or reach out, I’m happy to explain it to you and show you the advantage you’ll get in building your app using Krista to handle those parts versus yourself.

Scott King: Perfect, perfect. Thanks, Chris, and thanks, everyone. Until next time.

Close Bitnami banner
Bitnami