Verifiable LLMs for the Modern Enterprise
In 2024, enterprises have begun shifting from the initial ChatGPT-driven excitement of late 2022 towards practical applications. Businesses are now focusing on creating production-grade AI applications that leverage proprietary data for tasks like retrieval-augmented generation (RAG) and even custom training or fine-tuning of large language models (LLMs). This shift reflects a deeper integration of AI into core business processes, transitioning beyond the "Peak of Inflated Expectations," as identified by 2023 Hype Cycle for Emerging Technologies:
Here, when expectations are at their highest, we begin to witness negative consequences of rapid experimentation by the enterprise. In the case of LLMs and generative AI, this is evidenced by a number of lawsuits already formed against the industry’s largest AI companies. Brand trust is no longer adequate. Plaintiffs are demanding answers surrounding the processes that underpin the creation and usage of today’s proprietary LLMs:
- What datasets were used to train or fine-tune a particular LLM? Did these datasets contain any copyrighted content or protected intellectual property (IP)?
- Was sensitive data such as personally identifiable information (PII) removed prior to training or prior to populating a vector search database to be retrieved in prompts?
- Are user requests being processed with the correct LLM binaries and weights? When using a third-party hosted LLM service, can you trust that the third party has not manipulated responses in any way?
- Has any sensitive IP been accidentally sent over to a third-party LLM service due to RAG processes?
- How do we put managerial processes in place to govern and approve prompts in the codebase or AI agent flows that use enterprise datasets?
- How do we certify a piece of content as authentic to an organizational source and verify its provenance (such as a news publisher, or financial report, or individual health records)? What cryptographic ‘watermarks’ can be leveraged for content at its origin?
Competitive dynamics have pushed companies to forego these checks in favor of faster time-to-POC or product release, which ultimately means that in many cases not even the company knows the answers to these questions. As the lawsuits progress, the public will realize that in order to address the unanswered questions around proprietary LLMs, verifiability/auditability must be added to the process.
This leads us to the dawn of the verifiable LLM, a concept that we believe will soon become the cornerstone of LLM usage in large organizations. The core idea of the verifiable LLM is that the processes supporting its creation and subsequent usage can either be cryptographically proven or reproduced. Broadly speaking, this provides several benefits:
- Trustworthiness: Users and regulators can better understand or verify the sources and methods the model uses to construct its answers. For modern enterprises, this will allow LLMs to be deployed for novel use cases that were previously avoided due to trust risks.
- Improved Accuracy and Reliability: With verifiability, users can cross-check the information provided by the LLM against the original cited sources or data. This can lead to higher accuracy and reliability in the information provided, as the model's outputs are derived from a trackable information trail rather than an opaque pool of shielded data.
- Customization and Improvement: Understanding how the LLM arrives at its conclusions allows developers and researchers to better improve the model. It provides insights into the model's thought process, which can be critical for debugging and enhancing performance.
- Authenticity of content: Proving that content was either created manually by a human, or at least proving that it has been certified as authentic from a reputable source.
The remainder of this report will be dedicated to understanding these processes and the solutions we have available today to make them verifiable.
Working with Proprietary LLMs vs. Open-Source Frameworks
Over the last few years, the majority of generative AI advancements have come from proprietary LLMs developed by specialized organizations such as OpenAI, Anthropic, etc. To cater to the largest audience possible, these LLMs are designed to be highly generalizable, meaning users can rely on them for almost any use case. However, this generalized nature makes these models extremely costly to develop (OpenAI CEO Sam Altman estimated that the GPT-4 model cost the company over $100 million to develop). By outsourcing development to these specialized third parties, organizations avoid these high research and development (R&D) costs and only pay to use the LLM (running inference). While this approach has proven to be attractive for most use cases today, it requires organizations to trust in the “brand” of the LLM provider and trust that they have developed their model responsibly.
The alternative to proprietary LLM models is to leverage existing open source frameworks and adapt them for an organization’s specific needs in-house. Although open source models are free to use, this method does take time and engineering efforts to implement. Furthermore, the performance of open source models generally lags behind their proprietary counterparts, but this performance gap may be slowly closing.
An organization that chooses to adapt an open-source LLM has full control over its implementation. With regards to dataset integrity, the enterprise maintains full control over its datasets for training or fine-tuning. The lack of third party involvement also ensures that the organization will always retain full control of its data on its own servers and can implement any controls to protect against dataset manipulation. When it comes to executing LLM requests, the organization can be assured that the requests are running properly as it controls all aspects of the LLM stack.
Verifying Datasets, Training, and Fine-Tuning
Dataset Integrity
Enterprise LLM users are not the same as ordinary users that simply want to ask an LLM to generate a dinner recipe or answer a question about medieval history. Enterprise users have more sophisticated requests that rely on various external data sources like academic journals, financial reports (e.g., from Goldman Sachs), or joining source data from an enterprise data warehouse to provide more context to a proprietary LLM.
An organization may enter into a private agreement with the LLM provider to fine-tune a model with proprietary data. Similarly, an increasingly popular technique is to use RAG when making an inference request. This allows organizations to prime a generalized LLMs with additional information when submitting their request. Under both circumstances, the organization must trust the third-party LLM provider to preserve the sanctity of its supplemental dataset. This includes ensuring that:
- The third party is not adding or removing entries from the dataset on its servers.
- The dataset is not manipulated in transit from the enterprise’s servers to the third party’s servers.
- The supplemental dataset remains private and separated from the LLM provider’s other users.
Training and Fine-Tuning
Regarding fine-tuning, it’s important to note that we are seeing less interest in fine-tuning at the enterprise level, and unsurprisingly much more attention around RAG with vector search databases. One potential reason is that the industry is evolving so rapidly—often by the time an organization has finished fine-tuning a proprietary or open-source model, a new model version or better alternative is made available.
The landscape of proprietary LLM training is increasingly facing legal challenges, as evidenced by recent nine-figure litigation where an organization has sued a leading LLM provider for allegedly using their protected intellectual property IP in the provider's model. Such lawsuits underscore the emerging necessity for the industry to adopt standards that ensure LLMs are developed responsibly, both to navigate legal complexities and to maintain consumer trust. In response to these challenges, innovative solutions like Space and Time are gaining attention.
Space and Time, a novel database leveraging zero-knowledge (ZK) proofs, offers a way to cryptographically guarantee that large datasets have not been tampered. It also verifies that queries which retrieve subsets of this data have not been manipulated. By employing Space and Time’s cryptographic commitments against a dataset within the model itself during training, an organization can prove that the tamperproof dataset in Space and Time was the same dataset actually used to train the model, and that no content was added or removed since. This technology enables litigators or auditors to conduct reviews of the content used to train, leveraging both SQL and vector search retrieval. The reviewer can execute a ZK-proven query to retrieve vector embeddings' nearest neighbors that might match the proprietary IP claimed in litigation. If such embeddings are absent or dissimilar to the claims, it significantly strengthens the LLM provider's defense. The LLM provider can demonstrate to authorities that their model has not been trained on any sensitive customer data or external intellectual property that could be subject to copyright.
Model Outputs
Interacting with a proprietary LLM can be likened to engaging with a "black box." Key components such as training datasets, model binaries, weights, and algorithms are concealed to protect intellectual property and maintain commercial confidentiality. This obscurity leaves users unable to confirm whether the outputs they receive are indeed generated from their specific inputs. Furthermore, there is a potential risk that proprietary providers opt to utilize less expensive models for processing requests, potentially compromising the quality of the user experience.
Currently, a practical cryptographic approach to accurately verify the correctness of an LLM's outputs is not available, although a number of startups are beginning this R&D endeavor (particularly in the Web3 space). Even if future advancements in cryptography enable the verification of LLM outputs without a week-long proof time (as benchmarked by current zero-knowledge machine learning (zkML) tools) the implementation of such technology could be prohibitively expensive, limiting its use to very specialized scenarios with offline proving times. Consequently, organizations leveraging a proprietary third-party LLM must place considerable trust in their provider, relying on them to process requests using the correct model (and not a cheaper, smaller version) with the correct weights/parameters and correct training dataset.
Sanitizing RAG Processes
As enterprises increasingly incorporate vector search databases and LLMs into their operations, many developers find themselves navigating unfamiliar territory. This lack of experience is already leading to security oversights. A common mistake involves developers inadvertently transferring proprietary or sensitive PII from secure, SOC2-compliant data warehouses or object stores into vector search databases for RAG purposes. Such mishaps result in the unintended sharing of protected IP or customer PII with third-party LLM providers via the internet, thereby breaching SOC2 compliance and posing significant security risks.
To address these challenges, it's imperative for enterprises to establish stringent processes aimed at ensuring that developers remove any IP and PII from datasets prior to their integration into vector search databases for RAG. Looking ahead, we anticipate the necessity for innovative cryptographic tools that can verify and "prove" the absence of sensitive content within vector search databases, or detect and sanitize this content automatically. Such mechanisms will play a vital role in safeguarding against inadvertent data breaches, third-party providers having access to content they shouldn’t, and in fostering a secure environment for leveraging LLMs in enterprise settings.
Proving Provenance and Authenticity of Content
Finally, once verified LLMs are developed and used responsibly, with sensitive or tampered content sanitized, there will be a growing need for cryptographic watermarking of generated content—the outputs of LLMs and other generative models. In an era where the internet is inundated with AI-generated content, distinguishing between what is genuine and what is fabricated becomes a significant challenge for consumers. This dilemma extends across various domains, including news articles, blockchain transactions or NFTs, IoT sensor data, and camera-captured imagery, highlighting just a few areas where generative models pose a risk of content forgery by malicious actors.
The advent of LLMs and other generative technologies has simplified the process for these bad actors to create convincing forgeries across the web. In response, there's a vision for the future in which web browsers themselves could integrate with emerging standards for content watermarking. Such an integration would aid in safeguarding consumers by clearly differentiating between content that is verifiably authentic or human-generated and that which has been produced by AI. However, implementing this vision is far from straightforward. Even if high-quality LLM providers implement watermarking in their outputs, malicious actors could circumvent this by running their own models locally, generating content without any watermarks. The proposal for browsers to warn users about non-watermarked content as potentially untrustworthy faces the monumental challenge of flagging the vast majority of internet content, given the current scarcity of watermarking.
Ensuring a Safe and Responsible AI-Powered Future
As enterprises continue to integrate both proprietary and open-source LLMs into their business processes, the ability to verify the integrity of training datasets, training and fine-tuning processes, and model outputs will become increasingly critical to mitigating risk, protecting IP and PII, and ensuring responsible use. Though the proposed solutions outlined above introduce challenges of their own (such as cost and complexity of implementation), we are confident that forward-thinking cryptographic research will ultimately lead to a safer internet—one that guards consumers against fraudulent content and enterprises against lawsuits or security risks inherent to third-party LLM usage. The evolution from experimental to verifiable LLMs signifies a pivotal shift towards a more accountable and transparent AI future, where the authenticity of digital content is paramount, and the trustworthiness of AI systems is not just assumed but proven.