Generative AI and large language models (LLMs) are transforming human-to-computer interactions opening up groundbreaking possibilities for enhancing both customer and employee experiences. OpenAI’s ChatGPT is at the forefront of this revolution. However, it is not the only choice. There are many other generative AI and LLMs available each with their strengths, weaknesses, and costs.
Today, enterprises are grappling with which models to use and how to cost-effectively incorporate them into their existing infrastructures and specific use cases. The purpose of this paper is to provide a comprehensive guide and insightful analysis of how several LLMs fared in a side-by-side test for a knowledge management use case. We tested three different LLMs (ChatGPT-3.5, ChatGPT-4, and watsonx) using a constant document set and collated the results to help you make informed decisions about which LLMs may be best for your business and how to conduct your own test. By reading this paper, you’ll have a clear understanding of the capabilities of the LLMs we tested, their strengths, potential shortcomings, and steps to perform your own tests.