Confidential LLM

About 18 months ago, I took a contract to build an "offline end-to-end encrypted LLM" for a client. After some research, I ended up building a system around the AMD EPYC CPUs and lots of very fast RAM. I didn't have access to fancy GPUs with secure enclaves. This system is going to run in pure CPU, but has to perform as close to a GPU. What does performance mean?  In this case, it meant fidelity of results. The content of the results should be the same, whether run on a GPU or the E2EE LLM. The speed of the results will be different, obviously. 

Why AMD EPYC?  Because they have hardware memory encryption, very fast memory bandwidth, and SEV-SNP for confidential computing. Tie in encrypted filesystems reliant on the TPM and a hardware key. From a "secure boot" setup, through answering a query, the entire data flow was end-to-end encrypted. If at any point in time the server was seized, compromised, or other security calamity happens, the data is encrypted and safe. If something failed in the "secure boot" or anywhere along the authentication/encryption chain, then the validity/authentication is broken and the system will continue, but will not guarantee any results. The system was audited, security reviewed, and the customer still uses it today. 

However, while the customer tied it into their larger authentication system, I wanted to build something that was end-to-end encrypted and anonymous. Privacy is usurped by endless legal agreements, "what we collect about you" verbiage, and "how to opt-out" paragraphs (an argument for another time: why not ask us to opt-in instead of opt-out?). What if there were no personally identifying information? No email login/password combination? What if the system was entirely ephemeral, has zero logs, and had only temporary ram filesystems?

Taking an idea from Mullvad and BIP39 Mnemonics, could we build a system that allows for user-controlled authentication that is anonymous to the provider? I want it to work without a phone, without requiring passkeys, or other extra devices. A secure browser (Brave, Mullvad, Tor Browser, Firefox with Arkenfox, etc) should be all you need. A progressive web app should work just as well. I understand the world loves mobile phones, but the system shouldn't require one.

I have a test system running based on an AMD EPYC 9135 16-Core Processor. It's not impossibly fast, but it works and I've been using it for a few weeks without issue. There's a basic slider that lets the user choose between "confidential, secure, and private":

  • Confidential is "fully within the Trusted Execution Environment (TEE)", meaning entirely on the CPU. Unless someone has access to the fancy GPUs with TEEs.
  • Secure is within the TEE except the GPU is exposed and queries are sent across the local PCIe bus to the GPU, executed, and sent back to the TEE to return the results. This requires the user trusts the local PCIe bus and the GPU in the system.
  • The final option, private, allows for third-party APIs carried out under a proxy account with TLS-wrapped queries. I could extend this to cloud providers using their TEE offerings. 

I'm thinking of how to turn this into a service, in the crowded space of "AI LLM hosting" and/or a "confidential LLM in a box" that you can buy and self-host. Work continues. I'm happy to hear from you if you’re interested—please use my contact page.