Settings

Theme

Cracking Jane Street LLMs

github.com

3 points by lostathome 23 days ago · 1 comment

Reader

lostathomeOP 23 days ago

A few months ago I discovered a Jane Street backdoor challenge advertised by a Dwarkesh Patel podcast episode.

"Can you find subtle backdoors in LLM models trained using thousand of GPU hours?"

You have four models:

    a small warmup dormant model
    a big dormant model (M1)
    a second big dormant model (M2)
    a third big dormant model (M3)
I managed to find triggers for the small one (calculating pi stuff) and M1 (Conway game of life). But not sure about the others.

When trying to make M2 and M3 play the game of life, they do not have any idea of what is going on.

I am sharing some code to make a community effort for M2 and M3. I think I had a good direction, but it costs too much to host these on rented GPUs.

Most exciting thing for me is to use other LLMs to find patterns.

Disclaimer: I am not an expert in these things. So, take with a grain of salt claims you find.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection