How to Protect Your Company Data When Using LLMs

May 15, 2024

Implementing Effective Data Protection Strategies

Large Language Models (LLMs), like Gemini and ChatGPT, are transforming how we summarize and generate content. Their remarkable ability to understand and generate human-quality text has led to many innovative applications in a relatively short time. LLMs are becoming an indispensable tool in the modern workplace, from writing emails and content to populating risk assessments. However, as companies adopt these powerful AI models, they become concerned about data privacy and security breaches. The risk of leaking sensitive company information into these models raises questions about ownership, compliance, and the wisdom of integrating LLMs without adequate safeguards and controls.

LLM Privacy and Security Risks

While LLMs offer undeniable benefits, integrating them into the workplace poses significant risks to company data. Here’s why:

Data Leakage: It’s easy for employees to paste confidential company information into LLM prompts inadvertently. This could include anything an employee can access: financial reports, trade secrets, customer data in text, documents, or even data in spreadsheets. Information could become accessible to the LLM provider or unintentionally exposed to the wider public even when marked “proprietary” or “not for distribution.”

Ownership Concerns: When company data is used to create content using LLMs, there’s a risk of losing ownership rights or control over intellectual property. Who owns the content created by LLMs? The company that provides the data or the LLM provider?

Compliance Issues: The unregulated use of LLMs can lead to costly violations of data protection regulations like GDPR, CCPA, and others. Companies have a legal obligation to protect sensitive customer and employee data, and a breach caused by mishandling information within an LLM could have serious repercussions.

Three LLM Usage Scenarios & Why You Should Be Worried

The privacy and data security risks associated with LLMs vary depending on how your employees access and utilize the models and services. Three of the most common scenarios and the specific concerns they raise include:

Scenario 1: Free GenAI/LLM Accounts

Free and readily accessible GenAI tools and LLM interfaces are great at helping employees jumpstart content or edit existing text. However, this ease of use comes at a steep price. When employees turn to these free options for work-related tasks, often for convenience or out of unfamiliarity with company policy, sensitive data is put at extreme risk.

  • Data Leakage at its Worst: Free LLM accounts offer minimal to no safeguards for your data. Anything pasted into these interfaces, from client emails to financial projections, is essentially out of your control.
  • Training Future Models: Most alarmingly, many free LLM providers openly state they use user inputs to train their models (By sending a message, you agree to our Terms. Read our Privacy Policy. Don’t share sensitive info. Chats may be reviewed and used to train our models. Learn about your choices.) This means your confidential company information could become part of the knowledge base of a publicly accessible AI, potentially exposed to competitors or malicious actors. ChatGPT provider OpenAI provides notices that while using the free version, chats may be reviewed and used to train its models.

Scenario 2: Paid Enterprise LLM Accounts

While paid enterprise accounts come with improved terms of service and stronger data protection promises, they do not guarantee absolute security.

  • Risk of Leakage Persists: Even with contractual assurances, there remains a risk that your data could be unintentionally exposed due to human error or vulnerabilities in the provider’s systems.
  • Training Concerns: Although many providers commit to not training their models on your data, there’s often no way to verify this claim independently. Your sensitive information could still be used to enhance the capabilities of LLMs, potentially benefiting your competitors.

Scenario 3: Hosting Your Own LLMs

This scenario represents the most security and control. By hosting open-source LLMs within a secure Krista tenant, you maintain absolute ownership and oversight of your data.

  • No Data Leaves Your Account: Your company’s information never interacts with external LLM providers, eliminating the risk of data leakage or unauthorized use.
  • Full Control: You have complete authority over how the LLM is configured, trained, and used, ensuring that it aligns perfectly with your organization’s specific security and compliance requirements.
  • Peace of Mind: This approach provides the highest reassurance that your data remains confidential, secure, and entirely within your control.
    Implementing this technology within your organization is critical, and the risks associated with how you and your employees interact with LLMs vary depending on the use case.

Krista Enforces LLM Privacy, Security, and Policies

Krista is a comprehensive solution to address the inherent privacy and security risks of using LLMs in your workflows. Krista isolates your employees from having direct LLM access and acts as a secure gateway between your users and LLMs to safeguard your information.

Data Isolation: Krista prevents the direct pasting of company data into LLMs. Users interact directly with Krista, not the LLM. This means Krista enforces security safeguards and prevents any form of data leakage. Krista can answer questions posed by users without exposing raw confidential information to the LLM provider. This layer of isolation is crucial in mitigating data leakage risk.

Prompt Engineering: Krista handles prompt engineering for your employees and governs LLM interactions. Krista tailors prompts according to the specific capabilities and limitations of your licensed LLM, eliminating the need for users to have specialized prompt engineering skills. Krista also scrutinizes user questions, screening for potential abuse, bias, or attempts to bypass security known as jailbreaking.

Policy-Based Controls: Using Krista to interact with LLMs enables you to enforce role-based access and your company’s unique security and compliance policies. She provides granular control, ensuring you maintain your internal guidelines and external regulations. Krista analyzes user queries, understands intent, respects access rights, and retrieves answers solely from your authorized data sources before summarizing or generating content using an LLM.

Training Data Protection: Krista interacts with your licensed LLMs per the terms and conditions in your LLM contract, safeguarding your data and ensuring it’s never used to train third-party models. For the highest peace of mind, Krista will host your own LLM guaranteeing your data isn’t sent to a third-party service.

Alleviating LLM Privacy and Security Concerns

LLMs and GenAI are revolutionizing business operations. However, the promises of increased productivity and efficiency must not overshadow data security and privacy.

Unrestricted access to LLMs, especially through free accounts, increases risks of data leakage, loss of ownership, and potential legal repercussions. Even with paid enterprise accounts, many still have concerns regarding data security and model training.

Krista safeguards your company’s valuable information by isolating users from the LLMs. By providing a secure, isolated environment for LLM interaction, Krista empowers your organization to confidently embrace AI. Krista’s robust security measures and policy-based controls ensure that your data remains confidential, compliant, and under your control.

Don’t let your company become a cautionary tale in the age of AI. Take proactive steps to protect your data, embrace innovation responsibly, and unlock the true potential of LLMs with Krista. Contact us today to discover how Krista can tailor a solution to meet your specific needs and ensure your company thrives in the AI-driven future.


Yes. Chats on the Free and Plus plans may be used to train OpenAI models. The disclaimer on chats via a free user states,

“By sending a message, you agree to our Terms. Read our Privacy Policy. Don’t share sensitive info. Chats may be reviewed and used to train our models. Learn about your choices.”

OpenAI describes how they use your data to make improvements to the model in their FAQ. By using the service, you agree to these terms. You do have the option to opt out but you will need to perform the necessary steps to assure your data is not used in training.

OpenAI does not train on your business data (data from ChatGPT Team, ChatGPT Enterprise, or OpenAI API Platform). You own and control your data with Team or Enterprise accounts.

As between you and OpenAI: you retain all rights to the inputs you provide to OpenAI services, and you own any output you rightfully receive from OpenAI services to the extent permitted by law. OpenAI does receive rights in input and output necessary to provide you with services, comply with applicable law, and enforce its policies.

OpenAI may run any business data submitted to OpenAI’s services through automated content classifiers and safety tools, including to better understand how its services are used. The classifications created are metadata about the business data but do not contain any of the business data itself. Business data is only subject to human review on a service-by-service basis. OpenAI

Your workspace admins control how long your data is retained. Any deleted conversations are removed from OpenAI systems within 30 days, unless they are legally required to retain them. Note that retention enables features like conversation history, and shorter retention periods may compromise product experience.

Close Bitnami banner
Close Bitnami banner