Rise of the Prompt Engineer

September 20, 2023

The Prompt Engineer

Do you need a prompt engineer? Can you train your people to generate prompts? Can this be automated? Well, it depends.

LLMs, like BARD, OpenAI GPT-3.5 and 4, Llama, or Flan, require a diligent and persistent approach to testing and refining prompts. Their variable performance for different use cases reveals the critical need for consistent testing and evaluation to achieve accurate results. Therefore, prompt engineering isn’t a static task. It’s an evolving practice that isn’t an exact engineering principle. It’s part science and part art, therefore it requires constant attention and refinement. Whether you’re a seasoned professional in the field or just starting out, this article and corresponding podcast offer examples, and valuable insights into the challenges and solutions in prompt engineering.

What is a prompt?

A prompt, in the context of large language models (LLMs) like ChatGPT from OpenAI, Bard, and others, refers to the set of instructions and queries provided to the model to elicit a specific response. Essentially, it’s the way users communicate their needs and expectations to the model. The goal is to get information or answers based on the information fed into the model.

Rather than using standard computer-based query methods, like SQL (Standard Query Language) or XML (a standard for exchanging data between systems), prompting is rooted in natural language. This means users describe to the model, using everyday language, what they want. For example, they might provide context by specifying the right words, phrases, formatting, and configuration or even use images to guide the model’s response.

Using prompts, one can make the model read an email and craft a personalized response, ask it to answer using only table data without any textual description, or even get it to respond from various perspectives like that of an employee, a customer, or an expert.

However, the use of natural language makes prompt engineering a unique challenge. In the computer world, professionals are accustomed to deterministic systems, where certain inputs guarantee specific outputs. But when using prompts with LLMs, there’s an element of non-determinism. Interpretations can vary widely, and outcomes might not always align perfectly with user intentions. That’s where the art and expertise of prompt engineering play a pivotal role, ensuring that users get the results they’re seeking from these models.

What is a Prompt Engineer?

The role of a prompt engineer revolves around providing specific context and constraints to a prompt, ensuring the LLM delivers the desired output. Just like in more traditional fields, specific techniques have been developed for this purpose, such as:

Few-shot prompting: Where the model is given several examples to understand the kind of results you expect.

Chain of thought prompting: Directing the model on how you’d like it to reason or think before delivering a response.

However, prompt engineering is not a domain of traditional engineering. The truth is, it combines elements of art and science. Given the intricacies of each LLM and their unique response patterns, a deep understanding is often required to get the most out of them and companies need help in doing so. This demand has led to the rise of the role of prompt engineer. Today there are 64 jobs in the US listed on LinkedIn for the exact phrase “prompt engineer” and 49 on Indeed. But, while the engineering of prompts is crucial, especially to provide specific context for the LLM in use, this task doesn’t necessarily have to be executed by a human.

Since the main goal is to provide the LLM with context and constraints while feeding it any additional facts or relevant information, automation platforms, like Krista, can be employed to integrate into your systems to find data and to set the required context for any LLM you choose. If you aim to integrate an LLM to get substantial value from it, using an automation platform can be a more efficient route.

Better Prompts Produce Better Results

Better LLM prompts invariably produce better results. However, generating better prompts is difficult. You either have to hire a team of prompt engineers, train your employees on how to write prompts, or connect your systems with an LLM to feed your data, facts, and context.

Let’s break this down a bit further. The more explicitly and contextually you prompt an LLM, the more precise the output. Chris, Jeff, and I talked about several examples of how we are creating prompts for different purposes.

Customer Service

LLMs serve a pivotal role in redefining customer service interactions. Through natural language processing, they can read and understand the intent and context of customer queries, responding at a speed that far exceeds human capabilities. This was made evident in Jeff’s experiment, where meticulous prompts led to the extraction of precise customer data—such as account numbers and billing months—from service messages. Customer service interactions must use precise data to ensure that the LLM doesn’t “hallucinate” or provide irrelevant information. By setting explicit format expectations and instructing the model not to invent information, we can maintain a high level of accuracy in responses. This attention to detail is vital in maintaining a superior customer experience, as it allows for the delivery of personalized, accurate, and quick customer service.

Marketing Documentation

Many times I will use an LLM to help me clean up transcriptions of The Union podcast. A transcript is machine-generated by our audio system and then I’ll clean up using different prompts tailored to the LLM in use. I’ll ask the LLM to remove filler and comfort words and correct grammar (I did not for transcription below). Most times the LLM modifies the transcription so much that it changes the meaning of what we are talking about. Then, I have to revert to the original or manually modify it. This process emphasizes how dynamic the world of prompting is. Two LLMs I use require different prompts for the same task, necessitating a level of finesse and expertise.

Process Automation

LLMs are great for generating content to assist an employee in a step but are even more valuable when incorporated into a business process. When employees are seeking information for a part of their job or to take advantage of a benefit, LLMs many times are not the final destination. LLMs don’t know the answers to questions. They generate answers from facts and information in natural language that employees understand. Chris illuminated how automation can enhance the context for a prompt by adding vital information from another system. Integrating LLMs into a process and having a machine dynamically add contexts like customer orders, invoices, or data residing in separate systems is slow and error-prone. This is critical as more processes are automated, especially when prompt requests come in large volumes, a scenario where a human simply cannot keep up.

Prompt Engineering is Part Art and Part Science

Prompts, in essence, sit at the intersection of art and science. While there’s a structured, scientific approach to crafting them, they also require a touch of artistry to get them just right.

LLMs rapidly innovate and what works as a successful prompt today might not have the same efficacy in a few months. It’s a dynamic field, where the act of crafting the perfect prompt is constantly evolving. To combat this, companies will need to implement a form of regression testing, ensuring that responses remain consistent even as the underlying LLM changes. Two examples of this regression testing include a Stanford report which indicated that some LLMs might actually get “dumber” over time and our own bake-off test among several different LLMs. The key takeaway? Prompt engineering isn’t a set-it-and-forget-it task. Continual monitoring and testing are crucial to ensure desired outcomes. This might sound like a challenge, but it’s also a safeguard. As LLMs evolve, having this regular check ensures you’re always getting the most out of your model.

Automate Prompts

Creating effective prompts is part art and part science. Therefore, there is not a single method that you can use to successfully deploy and utilize LLMs. Whether you hire a prompt engineer, try to train your staff to correctly prompt and not leak your data or integrate your systems to provide the best context, we understand how critical it is to remain at the forefront. Therefore, don’t let the evolving landscape of LLMs leave you behind. At Krista, we’re ready to equip you with the expertise and automation needed to revolutionize your approach to prompt engineering together, ensuring your LLM delivers consistent, precise, and relevant outputs. The future of prompt engineering is here today, and we are ready to help you generate better prompts. Prompt a conversation at Krista.ai

Links and Resources

Speakers

Scott King

Scott King

Chief Marketer @ Krista

Chris Kraus

VP Product @ Krista

Jeff Bradley

Solution Architect @ Krista

Transcription

Scott King:

Alright, well hi everyone. I am Scott King and thanks for joining us in the latest episode of The Union Podcast. And I’m joined by my usual co-host, Chris Kraus. Chris, why don’t you explain a little bit about who you are?

Chris Kraus:

Hey Scott, as you know, I’m a dog dad. So I have Blue, who’s a black lab, who’s 13 years old, and she’s really sweet, but going deaf. And then I have Max, who’s a pug, who’s nine, but no one knows he’s nine, because he’s so high energy, he’s got the gray hair like me. And obviously, having a black lab and a black pug, that means my hobby is vacuuming dog hair every day around the house.

Scott King:

Okay, that is accurate, but a slight hallucination. So, Jeff, I’m gonna prompt you a little bit differently.

Jeff Bradley:

Alright.

Scott King:

Who are you and can you explain to me about what you do here at Krista and then how you apply that knowledge to provide value back to our customers?

Jeff Bradley:

Okay, that is much more explicit, and so I think I can craft an answer for that. So my name is Jeff Bradley. I am what we call a Senior Solution Architect at Krista, but what that really means is that I work with the product and I work with our customers to help them use Krista to achieve the business value that they’re trying to get out of our product and where that applies in things like using artificial intelligence and where now specifically we’re talking about prompts is I help them and us internally as well determine how we can best use AI to get the proper types of results that we’re looking to achieve those business results. So Maybe that’s a little high level, but that’s basically what I do and I suppose we’ll get into some of the more detailed stuff

Scott King:

I like that you gave a couple of caveats because even when I play with the LLMs, sometimes there’s a caveat. So I’ll prompt it. And obviously, we’re talking about prompting today, but we wanted to provide a couple of examples to begin with. But have you ever seen at the bottom of like ChatGPT that says, hey, you know, to the best of my knowledge, you may want to consult your lawyer or you may want to consult your employer? So I like that you gave some caveats with that. Well, that’s perfect. So today we’re talking about the rise of the prompt engineer because people ask us about this. Like how can I better prompt it? Or how can I train my model? Or how can I fine-tune? Essentially what they want, they want better results. Like what Jeff was explaining. Like how do I get the best result? So why don’t we talk about, we provided the example, let’s define what a prompt is. So Jeff, can you give us a better idea and don’t do your LLM hallucination this time. So Jeff, what is a prompt?

Jeff Bradley:

All right, I’ll do my best, but I just want to, I will say that all the information that I know of ends in 2021. So for the past two years, I have a limited amount of knowledge. But that said, so very simply stated, when we’re interacting with a large language model like a ChatGPT from OpenAI, Bard or any of the other ones that are becoming more and more popular, a prompt is very simply, the instructions that we’re giving to the model and the question or the information that we want the model to return us, you know to return to us So that that’s basically what a prompt is. What do I want from the model? What am I asking it? What do I expect back?

Scott King:

Okay, Chris, you want to add anything to what is a prompt?

Chris Kraus:

Yeah, so it is based on natural language, which makes it interesting because it’s different than things we’re used to in the computer industry. We’re used to things like SQL, standard query language, or the original was S-E-Q-L, standard English query language. We’re used to learning, here’s how to do something, here’s the rules of how it works, like select, from, table, where, order, by. So it’s something I could learn, and I actually learned in… my case in college because it was new back then. Now they’re probably learning in high school. Or things like XML. XML was purpose-built so we have a common way to describe data to pass between different systems and enterprises. So prompt engineering is natural language-based. It’s not a standard that’s adopted by multiple people. And so it causes problems because everybody can interpret a sentence differently. Everybody can interpret the intent and things differently. Because the advantage is it’s natural language based. The problem is you don’t know how they’re interpreting it. So that’s why it’s interesting to us. And to do that we need a prompt engineer to help do this thing called work with natural language? Because you think everybody speaks. We’ve all got natural language. But as you just proved in the intro, you asked me a question. It was a very accurate answer. It just wasn’t in the context of what you’re looking for.

Jeff Bradley:

If I can, I’ll add something really quick to the difference between the things that Chris talked about SQL and those other things as in any kind of programming language. And when we talk about IT and we talk about automation. We’re talking about deterministic things right we talk about. I want this record from a database table. I want, you know, a particular item where certain constraints apply. When we’re talking to these large language models now, it’s non-deterministic, meaning that it is, just like Chris said, it is subject to the interpretation. So that’s where the trick, I suppose, or the art of prompt engineering comes in. How do we get the types of results that we’re looking for?

Scott King:

So, I mean, you guys were talking about SQL and XML databases. These seem like engineering terms, but then you’re telling me that the prompt is natural language. So why are people asking if they need a prompt engineer if it’s not an actual engineering discipline?

Chris Kraus:

Because it actually, you have to learn how each engine is gonna respond to a prompt. So what you would prompt, say, Flawn versus Bart, Bear, or OpenAI may be different because the way it’s gonna interpret your prompt is differently. So like a human needs in their brain become an expert in one of them or maybe an expert in two of them because it is understanding some undocumented, unknown things at this point.

Scott King:

Okay, so do I need a prompt engineer? So if I’m in enterprise and if you’re, every single one of them is gonna install one of these LLMs, and they’re all looking for, maybe they’re looking for prompt engineers. I’ve seen job ads, hey, prompt engineer, this salary band must have these qualifications. You know, must be working with chat GPT longer than it existed, things like that. I mean, do I actually need one of these?

Chris Kraus:

So I would say, do you need a human being as a prompt engineer? Not necessarily. Do you need prompts to be engineered for the LLM you’re working with to describe the context? Yes. But it doesn’t necessarily have to be a person. Actually, software can do that for you. So I think there’s a subtlety. Do you need prompt engineering? Yes. Do you need an army of people to do it manually? My assertion would be no.

Scott King:

Jeff, agree, disagree?

Jeff Bradley:

I agree, obviously, if you’re going to use an intelligent automation platform like Krista, you can embed the context. Really, what you’re trying to do with these with a large language model is you’re trying to provide it context. What is the problem? What information do I want as well as constraints? What are the limitations that I want you as the model to come back with, as well as providing any additional training material other documents or information? That can all be done through an automation platform just like Krista. Obviously, if you’re just going to use ChatGPT in its web form, and you’re gonna try to get some value out of that or try to implement that, then you might need at least to start with a person. But again, I think the important thing, if you need people to do it, it’s the business context that you’re trying to solve. You can… You know, once we understand how those LLM works and how we get the best results out of them, that can be embedded and brought into Krista. And then Krista can then manage, you know, the interactions with those LLM.

Scott King:

Well, couldn’t I just like do a training class on how to prompt these things, right? There’s employee, you know, do I teach my employees how to prompt correctly? I’ve seen examples online, somebody will say, okay, here’s my prompt for this, or here’s my prompt for that, you know, copy this and, you know, modify it. Can’t you just do that?

Jeff Bradley:

Yeah, sure. But it depends on what you’re trying to do and how you’re trying to get the business value. There’s a difference between when you and I are using this at home or maybe you’re gonna use it to try to generate some marketing documents or something like that. You know, here’s all the information, give me a marketing document, but you’re still gonna take that document and you’re gonna edit it and tweak it and refine it, right? You know, that’s maybe the kind of thing that you might, you know, I don’t even know about training classes. I would just say go online and try to figure that stuff out. I think when you’re trying to get real business value and trying to incorporate these into the enterprise, again, I think it makes sense to use some kind of platform that’s going to manage those for you. I think that’s how you’ll get the faster value, how you’re going to get. more bang for your buck without necessarily having your employees have to learn, you know, watsonx, ChatGPT-3.5, 4 and llama and FLAN and all these others.

Chris Kraus:

So Jeff, we were talking before, I think you’ve actually run samples of this, right? Inside the engines to see how with and without prompts things come back differently. So what do those look like?

Jeff Bradley:

Yeah, so, you know, one interesting example that we’ve done is we take customer service input messages, and these can come over a chatbot or they can come through email, and what we do is we take that information and we’re extracting data from it. And when we first just said, okay, we’re gonna use a large language model, OpenAI, let’s just say, okay, please give me the account number and the billing month that this customer is requesting. First of all, that may not even be in there. And one of the things that we found, if somebody just said, hey, I want information about subscribing to your service, and then we go in and we pass it in and say, can you find an account number and a billing month? There isn’t one. So first of all, what the model will do is in fact hallucinate and it will come back with some account number and some billing month, even though one doesn’t exist. So when you just say, give me the account number and billing number, then you have to go and start saying things like, and if there isn’t one, please do not respond with any data and please only give me the information in its raw form, the number, the account number, the billing month in month and day. In fact, another thing that we had to do because when we’re using OpenAI, we had to say, assume the current year is 2023. And actually, we just grabbed the calendar and pushed the current date in there. Because if someone said, I need some information for my bill in June, it would come back and say, June 2021. So, what we had to do was we would say, okay, let’s give it a lot of information about what we want very specifically and how we want that data to be formatted. And if you don’t have the data, please don’t make it up. And so, then we had to go through a wide variety of different customer sample data in order to just hone in those prompts. But once we did that, we could put that into the Krista tool and now it just runs automatically for that customer. And it’s great.

Chris Kraus:

So it sounds like it’s actually really important. Like better prompts do better things.

Jeff Bradley:

Exactly. You want to be, you want to be explicit.

Chris Kraus:

Scott, I know you actually do this yourself with our podcasts to clean up the language in our podcasts and create summaries, right? So you do a different type of prompting, which is very different from what Jeff was talking about of extracting data. So what do your prompts look like?

Scott King:

To clean up the podcast transcript. So yeah, the transcript is machine-generated. And so I have a couple of different prompts because I use two different tools. So that’s kind of interesting that you guys were talking about that. But I’ll tell it, hey, clean up the podcast. First I have to ask it, do you know what a podcast transcription is, right? To give it context. And then it’ll say, yes, yada yada, and I’m like, okay, I want you to clean this up for grammar and punctuation and remove all the comfort words. So those are words like, you know, the world. And so it does a pretty good job. I’ve played with a couple of different ones. I’ve played with the OpenAI ones and then I’ve paid for one called Jasper. It actually does a little bit better job because it’s more of a writing tool than just a everything tool. So yeah, and then the prompts for each are completely different, and I learned by trial and error, because a lot of times it would summarize too much. So then I had to tell it, hey, don’t omit any content, use all the content, because it would take 300 words and squeeze it into 150. And then it didn’t sound like what we were talking about at all.

Chris Kraus:

Yeah, so my experience has been different than the two of yours. When I do prompting, a lot of times I need the machine or the software to help me do the prompt. So like when I’m writing information about an employee, I will use an API call to extract who’s the employee, what country do they work in, are they a manager, which subsidiary company are they in to get the context of the person. And then when I ask what’s… “What’s a vacation policy?” It’s not generally a vacation policy. It’s for a specific person of a role in the organization. So I use it to act. I use prompting a lot of times to add information to things. You guys are talking about removing information. I actually do the opposite. I take systems of record and I’ll say, go get the last 10 orders for a company. Go get the last 10 invoices and their statuses for a company and provide that information so that when they answer a question in the context of my orders and my statuses, I’ve dynamically done that. So obviously that’s something the machine has to do. Like a human can’t go into Salesforce and cut and paste into the prompts and go into SAP and get the invoices and that, you need a machine to do that. So you need prompting to be done by a machine at machine speed. Because if 400 customers ask that, a human can’t do that manually. You need to automate that process so it happens at machine speed. So I think it’s interesting, we’ve talked about two different profiles. One is… narrowing scope, yours is like how do you help it summarize better and get rid of these safety words and such. The other is how do you actually give it the information so that it actually gives you a better answer. So,it’s kind of interesting, three different things.

Scott King:

Yeah, and, but both have the same goal and that was kind of what was in Jeff’s intro. He was talking about business value, right? So a lot of times people ask us about this and they’re thinking one step at a time and they need to understand that that’s actually a step in a larger process, right? If I wanna ask a question about orders, that means I’m probably after something else. Like is… is the average order of this customer above or below the rest of the mean, right? Is this a high-value customer or can this customer wait? But if you have a machine do this, then, you know, obviously that happens at machine speed. We didn’t even talk about like mean time to resolution. Like what could machine do machine speed do to all of these processes? Right. I mean, everything just goes faster and it’s more intelligent. But you have to create you have to have a way to get to that speed, right Chris? So you were talking about integrating with Salesforce, a lot of people ask us about that too. But that’s important, right? Because if you can’t go grab the data quickly, the answer takes just as long.

Scott King:

All right, so if you guys are talking, we’re talking about the prompt engineer and we’re actually talking, we’re talking a lot about engineering principles, but from what I understand from the two of you, there’s actually no standard way to do this.

Chris Kraus:

It’s more art than science.

Scott King:

Yeah, it is. So. So how do we balance the art and the science? The science is easy, right? We can talk about structured languages, but here it’s more art. So how do you guys advise people on doing more of the artful prompts?

Chris Kraus:

So I think there are two things we do. I’ll start out and I’ll pass to Jeff because we’ve both done a lot of this. But the first thing is realizing that there is an arch for learning how to prompt an engine. But there’s always a new LLM. There’s always a new version. So what works today may not work in two months or three months. So there is the concept of having, say, a test set. And God knows, we’ve all three worked for a testing company. We love some regression tests. But there is the concept of like, given this information, if I ask the same prompted question to it every week and then it changes, then we would say that’s actually a failing regression test because the results may not be the same. So realize we put some discipline we would learn about regression testing into this because we want to test the LLMs automatically every week to say have they made a change to the API I don’t know about? Like is there an unintended consequence that occurs in the background that it got smarter, it got dumber, or as John says, it lawyered up on me and said, you know, so there’s part of that. And so there is the whole, it’s ever-changing. So I mean, and Jeff, you’ve done a lot more prompting with different LLMs and different ways to prompt. What’s your experience in that? Mine is that I love testing. So, you know, I’m going to go that approach.

Jeff Bradley:

I think you have to, to your point, Chris. So, I mean, well, you know, to, I’m gonna pull the thread on your comment for a second because I think it’s important. And maybe we can put this in the show notes. I didn’t have this information handy, or I don’t have it at the moment, but there was a report that Stanford University released maybe a month or so ago that they had been doing continuous testing against. Open AI, I don’t know if it was three, five or four. I think it was four, but anyway, they had actually proven through their testing that the Open AI LLM had been tuned down to be dumber. So if you think about it, unless you’re going to, and this is actually something that we can do and that we will support, if you’re going to, if you wanna be assured that you’re gonna get the consistent responses, once I’ve tuned my, or engineered my prompts the way I want them, and I want to get consistent results back, if I don’t have an on-prem LLM, I’m gonna have to continuously test and make sure that those work, because it’s out of your control when it’s somebody else’s LLM and they can change it over time, and they do.

Chris Kraus:

So they can’t be hard-coded, they have to adapt to changes, right?

Jeff Bradley:

Right. So that I think and I think I think the trend is it probably they’re going to get smarter over time. So hopefully you’re not going to lose anything. But as that Stanford report actually indicated sometimes they get dumber over time. So a prompt or a set of prompts to get me some very specific information pertaining to my business may not work after a period of time because of changes made. And so that that’s That’s a risk and so that is why the testing, continuous testing of this is tremendously important.
To kind of put a bow on this, in terms of working with different LLMs, whether you’re, if you’re working with BARD or some of these other ones, they really come down to what type of information are you trying to get from them? Sometimes they’re really accurate and then sometimes, yeah, sometimes they’re just terrible, but you have to really work with each and every LLM and rigorously test it and make sure that it is giving you the right information and you have to keep that doing that on an ongoing basis.

Scott King:

Okay, yeah, if you want the right information from it, you probably ought to give it the right answer, right? And then if it’s getting dumber over time, yeah, that’s not good. And I have experience with that. So if we want to sum this up and people say, okay, yes, I’m in. I want to engineer better prompts. My choice is I have a person or a team do this. I have a machine do it for me. I’ll ask you Chris first since Jeff just spoke. What do they need to do? Where do they go? What answers do they need to provide with the rest of the team? so they can be comfortable and get going because time is of the essence, right?

Chris Kraus:

Yeah, so I think to realize first market awareness, like there are platforms like Chris said that know how to prompt different engines differently. You give it a statement like. This is an employee in North America who’s a manager with a level six security clearance. So you want a platform that knows how to take the attributes of someone and then secure what they’re gonna retrieve and then actually be able to prompt the engine correctly in the context of the person itself. So that’s something is like, you don’t need a bunch of prompt engineers to be hired, what you probably need is a handful of people who are. prompt engineers or prompt engineers savvy who know that I need to give context when the business person asks a question. And then you’ll want a platform that actually knows the subtleties of Flon versus Chet GPT and it in the background does the appropriate prompting. So it hits the right natural language to trigger the right rules. And then you want a machine speed to regress that every day and test it.

Scott King:

All right, Jeff, anything you wanna add?

Jeff Bradley:

I don’t have anything to add to that because I think that just sums it up very, very nicely. If people really do want to dig into it, how do I get the best results out of ChatGPT in lieu of using a platform like Krista? There are plenty of good resources on the web. There are plenty of techniques that people are laying out and there are good university research papers on how to get the best information out of ChatGPT or Bard or Llama or FLAN and that you know that you can one can look up and see how do I do this you know what are the different techniques I can use in order to get the type of info that I want out of my.

Chris Kraus:

Oh my gosh, I can’t remember how to interact with three devices, or how to interact with seven different APIs who want different things. That sounds really complicated.

Scott King:

Yeah, yeah. And Jeff, I will give, you know, we’re talking about caveats. You said the best results, the best result at an acceptable cost, right? Because this is, it’s a nonlinear cost curve. You can keep spending money and you’ll never get to 100%, right?

Jeff Bradley:

That’s right.

Scott King:

So, well thanks, guys. I appreciate it talking about prompt engineering. Hopefully, everyone listening, you learned a little bit. I did. about the art versus the science, and there’s no standardized way. So I really appreciate you guys, and we’ll see you next time on The Union Podcast.

Guide:  How to Integrate Generative AI

Close Bitnami banner
Bitnami