Jan Leike joins Anthropic on their superalignment team

99 points by icpmacdo 2 years ago · 38 comments

Reader

Lerc 2 years ago

I was very impressed with Anthropic's paper on Concept mapping.

Post https://www.anthropic.com/news/mapping-mind-language-model

Paper https://transformer-circuits.pub/2024/scaling-monosemanticit...

This seems like a very good starting point for alignment. One could almost see a pathway to making something like the laws of robotics from here. It's a long way to go, but a good first step.

mvkel 2 years ago

These superaligners.

"I am breaking out on my own! Together we will do bigger and better things!!!"

"Ok I'll join the other guys."

I think it's pretty clear that the capital markets have next door to no interest in alignment pursuits, and only the most-funded apply a token amount of investment towards it.

whimsicalism 2 years ago

@dang - I find topics like these quite interesting. Are they downweighted due to AI relatedness (or is twitter?) or just being flagged a lot?

Imnimo 2 years ago

"Automated alignment research" suggests he's still interested in following the superalignment blueprint from OpenAI. So what do you do while you're waiting for the AI that's capable of doing alignment research for you to arrive? If you believe this is a viable path, what's the point of putzing around doing your own research when you'll allegedly have an army of AI researchers at your command in the near future?

solveit 2 years ago

Well, I presume you have to figure out how to evaluate their output, especially for trustworthiness. And that's something you have to do the core of yourself, no matter how many AI researchers you'll have.
- Imnimo 2 years ago
  
  The premise of the plan is that evaluating output is easier than producing it, such that a human researcher could look at the AI researcher's output and tell if it's correct and trustworthy. If this is true, what else is there to figure out?
whimsicalism 2 years ago

> what do you do while you're waiting for the AI that's capable of doing alignment research for you to arrive
Nobody interested in superalignment is interested in waiting until actually threatening AI gets here.
- Imnimo 2 years ago
  
  But that's the fundamental superalignment plan - train a human-level alignment researcher AI, run a bunch of them in parallel, and review their research output to see if they solve the alignment problem. You can't do the plan until the human-level alignment researcher AI already exists.
  - whimsicalism 2 years ago
    
    A large part of the idea is that you can develop techniques for aligning sub-human AI using even stupider AI and hope/pray that continues to generalize once you get to super-human AI being aligned by human-level AI.
DalasNoin 2 years ago

Current systems are already (in a limited way) helping with alignment, anthropic is using its AI to label the sparse features of their sparse auto encoder approach. I think the original idea of labeling neurons by AI came from william saunders, who also left openai recently.
warkdarrior 2 years ago

I think his tweet can be read as "research in (1) scalable oversight, (2) weak-to-strong generalization, and (3) automated alignment".

smountjoy 2 years ago

"Superalignment" is (was?) OpenAI's term, so it might be more accurate to say he is joining Anthropic to work on alignment.

sp332 2 years ago

Looks like superalignment was Jan Leike's term, since the team at OpenAI dissolved immediately without him.
eminence32 2 years ago

Is there a difference between "superalignment" and "alignment" ?
- rfw300 2 years ago
  
  Yes. “Superalignment” (admittedly a corny term) refers to the specific case of aligning AI systems that are more intelligent than human beings. Alignment is an umbrella term which can also refer to basic work like fine-tuning an LLM to follow instructions.
  - thefaux 2 years ago
    
    Is this not something of an oxymoron? If there exists an ai that is more intelligent than humans, how could we mere mortals hope to control it? If we hinder it so that it cannot act in ways that harm humans, can we really be said to have created superintelligence?
    It seems to me that the only way to achieve superalignment is to not create superintelligence, if that is even within our control.
    
    renewiltord 2 years ago
    
    Not self-evident. Fungus can control ant. Toxoplasma gondii can control human. Who is more intelligent? So if control of more intelligent being is possible, could it be symbiotic to permit? Alpha-proteobacteria sister to ancestor proto-mitochondria and now we live aligned. But those beings lacked conscious agency. We have more than them. Not self-evident we will fail at this.
    
    dr_dshiv 2 years ago
    
    Another example is the alignment between our hindbrain, limbic system and neocortex. Neocortex is smarter but is usually controlled by lower level processes…
    Note that misalignment between these systems is very common.
    
    whimsicalism 2 years ago
    
    Many people share your views, but others believe it is possible.
  - afefers 2 years ago
    
    Huh! All this time I thought the "super" was just for branding/differentiation.
    
    cwillu 2 years ago
    
    Alignment was the original term, but has been largely coopted to mean a vaguely similar looking concept of public safety around the capabilities of current models.
    
    zucker42 2 years ago
    
    That was definitely part of it.
  - halfjoking 2 years ago
    
    Then why don't they call politicians "super-politicians"?
    Their purpose is to control the population by being lesser beings who feed off corporations and just push their message.
- exe34 2 years ago
  
  i suppose the difference between imaginary and "super"-imaginary isn't very important from a practical point of view.
  they worry about alignment for ai, I worry about alignment for the corporations that wield technology, any technology.
- throw5345346 2 years ago
  
  Oh yes. One is super.

htrp 2 years ago

it's also completely theoretical, until it isn't (ref paperclip maximizers)

andrewfromx 2 years ago

I keep getting Anthropic and Extropic (Guillaume Verdon / Beff Jezos) names mixed up. Anthropic is Claude and Extropic is Thermodynamic hardware many orders of magnitude faster and more energy efficient than CPUs/GPUs.*

* parameterized stochastic analog circuits that implement energy-based models (EBMs). Stochastic computing is a computing paradigm that represents numbers using the probability of ones in a bitstream.

whimsicalism 2 years ago

yes, one is a real company and one is...
sanjeetsuhag 2 years ago

> Thermodynamic hardware many orders of magnitude faster and more energy efficient than CPUs/GPUs.
I’m sorry, but is this thermodynamic hardware real? Are there any benchmarks? Those claims are pretty strong.
- DiabloD3 2 years ago
  
  "Yes", but only extremely simple demo circuits.
  Basically, they are hedging a bet on the following: When you perform a calculation, the electricity that went into the circuit only exits as the answer, anything else that didn't become the answer turns into waste heat and electromagnetic fields.... what if you reversed the calculation, and the only waste produced is transmission of the answer?
  If you know anything about EE, you'd know that what I said is an extremely simple view of how modern ALUs are made, and ignores the past 40+ years of optimizations; however, they believe by "undoing" the optimization and "redoing" it as an entirely reversible operation not only will work, but will the final optimization we can make.
  There will be no benchmarks of the kind you want, because that isn't the issue: I can take any CPU off the shelf today, and run it 10 times faster: it will melt because of self-generated heat, but for a glorious microsecond, it will be the fastest CPU on earth.
  They are stating that they have potentially fixed one of the largest generators of waste heat, which would allow us, using all of our existing technology, to start ramping up our clockspeeds, and our true final frontier will be trace lengths at macroscale (which is already a problem at the clockspeeds we use for DDR5 and PCI-E 6).
  However, given how Extropic's website says none of what I just said, they're probably just some startup trying to ride the AI wave, and then close shop in a few years. I doubt they've magically figured out one of the hardest problems in EE atm. They are also not the only company in this space, and every single major semiconductor company in the world is trying to solve it.
  - whimsicalism 2 years ago
    
    from my understanding, this will only be able to accelerate EBM (energy-based models) which they could scale up in simulation to show that they would be useful
    EBMs as of now are not really that useful at all.
- andrewfromx 2 years ago
  
  https://www.extropic.ai/future
- geodel 2 years ago
  
  Well you've got Beff Jezos. This is as real as it gets.

Settings

Jan Leike joins Anthropic on their superalignment team

Keyboard Shortcuts