LeftoverLocals: Listening to LLM responses through leaked GPU local memory

blog.trailofbits.com

136 points by ks6g10 3 years ago · 41 comments

Reader

At this point, I assume this is the default and don’t expect data recovery to not be provide on the same physical machine (even across virtualization barriers).

If your data is that sensitive, run it on dedicated hardware. Papering over this with mitigation over mitigation is a fool’s errand: both a genuine waste of compute resources and guaranteed to be a game of cat and mouse.

Retr0id 3 years ago

This is certainly the pragmatic approach to GPU memory in 2024, but I don't think it's a fool's errand. It's a solved problem on the CPU side of things, and I don't see any reason why we can't solve it in the GPU domain too.
Notably:
> NVIDIA: confirmed that their devices are not currently impacted
> ARM: also confirmed that their devices are not currently impacted.
- malf 3 years ago
  
  > It's a solved problem on the CPU side of things
  Is it? gestures at pile of cpu bugs
- ks6g10OP 3 years ago
  
  The reason why arm is not affected is that their "local" memory is non existent and they just spill everything to cache.
- htrp 3 years ago
  
  TIL that ARM has GPUs

declaredapple 3 years ago

GPU memory isolation is generally really bad, from between processes and entire virtual machines.

Does anyone know if nvidia's virtual gpus improve the isolation at all?

jeroenhd 3 years ago

This is one of my main concerns with technologies like WebGPU. Luckily, WebGPU seems to sacrifice some performance to keep attacks like these from working: https://github.com/trailofbits/LeftoverLocalsRelease/tree/ma...
hhh 3 years ago

IIRC it only improves if you use MIG.
- declaredapple 3 years ago
  
  So I guess only the A30/A100/H100 then?

Kab1r 3 years ago

I've been told that historically performance has been prioritized over security in the GPU space. Mitigating things like this does incur a performance penalty.

frankjr 3 years ago

> Since September 2023, we have been working with CERT (..)

> Apple: Despite multiple efforts to establish contact through CERT/CC, we only received a response from Apple on January 13, 2024.

> Apple did not respond or engage with us regarding the disclosure.

Well at least they are consistent at not giving a flying f*ck about working with bug reporters, no matter who you are. I have reported 5+ radars in the past and have never received any response, not even a confirmation.

kridsdale1 3 years ago

My first job post college was a Radar triager for specific Apple frameworks. I was 23 or so, probably high the previous evening, and I did not give a FUCK about some external developer and their problems.
Sorry. That was a long time ago.
adr1an 3 years ago

At least you don't get sued or incarcerated /s

Veserv 3 years ago

tl;dr GPU drivers made by various vendors do not sanitize compute unit hardware scratch memory between uses, so you can just freely read whatever the last user left laying around when they stopped.

Literally too incompetent to follow even basic security 101 practices. A time shared device must be sanitized between users to prevent state leakage. There is no reason to believe that a security culture that clueless when developing a universally shared, high criticality device can be believed if they claim to do better elsewhere. Their process is either so incompetent or so inconsistent that their claims can not be believed without external audits.

In this case: Apple, Qualcomm, AMD, Imagination.

Edit: Added Imagination as noted by reply.

bee_rider 3 years ago

These designs all come out of the consumer and gaming space. They correctly trade security away for performance. Blame whoever started running untrustworthy code on them.
These vulnerabilities will continue happening. What I don’t understand is how anybody can be surprised at this point. If anyone out there missed the first dozen instances of this: workloads on modern hardware can’t be isolated.
- pjmlp 3 years ago
  
  They incorrectly trade security for performance, in the days of money transactions in games, esports, and server dependencies.
  - bee_rider 3 years ago
    
    I built a trampoline out of laptops, but I hurt my knee, I don’t know why Dell insists on making these dangerously sharp and hard trampoline parts.
    
    pjmlp 3 years ago
    
    It isn't really the same, but no worries, they will forced to adapt to the upcoming cybersegurity laws.
    Just like someone can sue the maker of a trampoline made out of laptops when they cut themselves on them.
    
    bee_rider 3 years ago
    
    It depends on how the laws are written. Legislators are often wrong about tech.
    But if you are saying Amazon and other cloud providers should be sued, I agree, companies that use parts not fit for the application they are going for should be sued out of business.
    That said, the existence of the parts isn’t a problem and I hope manufacturers keep making high performance parts for those of us who don’t use them for crazy and inappropriate things. Manufacturers could be slapped on the wrist for selling their consumer chips as server chips, but I think this is less of a problem because anyone who falls for it is already a walking catastrophe.
- kridsdale1 3 years ago
  
  Yeah it’s pretty insane that the entire AI hype and crypto hype industry waves are actually just getting really specific pixel shaders to sparkle in extremely specific ways.
  The gear was made to frag noobs in Counterstrike at 300 FPS.
vlovich123 3 years ago

And Imagination.
Notably Intel and Nvidia were not impacted. I wonder if the security hardening that Google worked on with Nvidia for Stadia helped prevent this
- Veserv 3 years ago
  
  Jeez, I hope it did not require “security hardening” for Nvidia to do something this basic. If these other vendors missed some tiny corner resulting in state leakage, that would be understandable. But, forgetting to clear local memory is just inexcusable.
  Imagine a OS forgetting to replace your general purpose registers across context switches. Only a rank incompetent and useless security process would let something like that get all the way through to deployment.
  - vlovich123 3 years ago
    
    Vendors have consistently ignored multi tenant issues when coding because gaming doesn’t need it and cloud traditionally hasn’t used GPUs all that much.
    You’d be surprised by how many security issues exist in GPU drivers
- ks6g10OP 3 years ago
  
  Could probably be that the shared memory (at least in the past) also was used for cache, so the same mechanism that probably sanitizer the cache is/was in play here.
- jmgao 3 years ago
  
  Google used AMD GPUs for Stadia, not NVIDIA.
  - vlovich123 3 years ago
    
    Eventually for the product. My memory may be faulty but I talked with engineers working on it during development and I’m pretty sure the initial development was on Nvidia.
- kevingadd 3 years ago
  
  Could imagine it being related to CUDA as well. Memory being consistently zero-initialized helps prevent application bugs.
  - vlovich123 3 years ago
    
    Maybe but multi tenant GPU use cases only really come up for cloud and cloud GPU popularity is only a little more recent.
kevingadd 3 years ago

Wasn't too long ago that stray texture data would be left lying around from other processes, too. You could exploit that to read the contents of the user's banking tabs and stuff like that.
dist-epoch 3 years ago

It's not so simple.
For example Windows takes over GPU memory control, it virtualizes it, and allocates it to various applications, zeros it, etc...
- ks6g10OP 3 years ago
  
  This is specific to local/scratch memory which is not exposed for allocation in the same way dram is.
throwitaway222 3 years ago

Man so this has been an issue for 20 years, we suddenly invent LLMs and you want your money back on Nvidia.

dmvdoug 3 years ago

Is this the Golden Age of hardware vulnerabilities?

formerly_proven 3 years ago

At some point in the last ~20 years a lot of people started interpreting "this system is meant to protect against programming errors crashing the whole system" as "this is a watertight security boundary and I can rely on zero information leaking across". The results are, uh, roughly what you'd expect.
ngneer 3 years ago

Kind of. However, I would venture most real world attack scenarios do not leverage HW vulnerabilities. But wait, how do we know what is happening invisibly? And what about state actors? The answer is we do not know, but the economics do not change based on whether an attack is made visible or not. Attacks tend to follow and reveal the path of least impedance. If software attacks are working fine for most, why would anyone spend more on weaponizing a HW exploit?
- dmvdoug 3 years ago
  
  I just feel like maybe 20 years ago people thought the hardware was the hardware and all the security issues were inevitably to be found in software. I mean, I know that people who work with hardware for a living always say that hardware has always been shit, but it really does feel now like everything is a security vulnerability, in a way that people weren’t looking for previously. Then again, maybe they were, and I just wasn’t paying attention. Ah, to be young and carefree again.
  - ngneer 3 years ago
    
    You are right. These days the lower layers are explored in greater detail. There was a noticeable shift over the last decade or so in mainstream security research. HW security has always been a subject of study, but there is renewed interest in microarchitectural attacks since the late 2010's, when speculative execution was shown vulnerable.

geuis 3 years ago

I don't know if this is related, but I reported an issue back in September or October about conversations with ChatGPT 3.5 leaking into new sessions. I noticed that after having a semi long exchange, I could start a new session and ask a question about one of the previous sessions and the LLM would respond with details it could only obtain from one of the prior ones.

This was definitely happening with OpenAI's web interface. It might have been happening via API calls too but it's been a while and I don't remember.

Nothing ever came of the report as far as I know.

dang 3 years ago

Url changed from https://leftoverlocals.com/, which points to this and has more information.

Settings

LeftoverLocals: Listening to LLM responses through leaked GPU local memory

Keyboard Shortcuts