The era of full stack chip designers
chipinsights.substack.com41 points by bharathw30 a day ago
41 points by bharathw30 a day ago
A lot of steps are missing, seems like OP doesn't have much experience. Sure, you can license ARM or download RISC-V, configure and validate the RTL (even configurations with no RTL changes require RTL validation, and a mountain of test vectors), license some analog IP from Synopsys for analog, power and clocking; synthesize your design, place and route, timing converge, functionally validate against the RTL, lay out pinmap and bonding rules, fracture the DB, send the GLS to TSMC, validate the package characteristics and process corner, do post-silicon debug of ROM/timing/package/digital/analog, and maybe if the gods smile on you it'll only be one stepping and won't need any FIB edits ... but that requires an army of people to get it done in under a year. Re-designing a modern ISA would take one person decades, look how long the first cut of RISC-V took with genius volunteers. Maybe if you want to build a 6502 on your own for fun and can cough up $50k for a 0.180 micron shuttle at TSMC or Global Foundries. It's fun to fantasize about AI making all this happen automatically, but chip design is wildly nontrival. Its funny, Sematech talked about 3rd (or 4th?)-generation silicon design where humans would be taken out of the loop entirely within the next decade... back in 1993!
(Source: I've been a CPU Architect going on my fourth decade.)
Unlike software product development, even when using a foundries chip manufacture requires 7-8 figure (USD) development budgets. That is per iteration. Unlike JS development there isn't massive volumes of internet resources to train LLMs on to produce usable RTL, etc code.
Things like my https://wafer.space ($7k USD), TinyTapeout.com (<$200 USD) and ChipFoundry.io (~$15k USD) are making it much cheaper to do IC design.
There are a huge number of designs from Tiny Tapeout which are all public - see https://tinytapeout.com/runs/
The designs are still more in the MCU size, but you have to start somewhere!
The Google open MPW program also had 10 runs with 40 projects published at http://foss-eda-tools.googlesource.com/third_party/shuttle/ -- All the submissions had to be open source and there were 1000+ of those. I did try pitching to multiple Google Research groups that continuing the open MPW funding would grow the available designs which have been manufactured and that was useful for AI training but didn't get any bites.
The now defunct Efabless also ran a number of challenges in this space which got pretty good results, see https://efabless.com/challenges
>Unlike JS development there isn't massive volumes of internet resources to train LLMs on to produce usable RTL, etc code.
There is local data on each major manufacturer/designer that they can use to train their LLMs. I'm sure Synopsys/Mentor/Siemens are also working towards providing solutions to their customers can use to train their own LLMs.
This is pretty much the business model of Adobe in creating their 'copyright compliant' image generation features.
The companies with the biggest treasure trove of non-public training data will likely have a technical advantage. _If_ they can purposefully use that data to create better models for their niche problem domain.
While in general I am very much in favour of playing both sides of an abstraction, I would argue that RTL is nowhere near the top of the stack.
In the GPU world, you have games, which are built on game engines, which are built against an API, which is implemented in a driver (which contains a compiler!), which communicates through OS abstractions with hardware, which is written in a HDL (this is where you write RTL!), which is then laid out in silicon. Now each of these parts of the stack have ridiculous complexity in them, and there are definitely things in the upper layers of the stack that impact how you want to write your RTL (and vice versa). So if your stack knowledge stops at RTL(which, honestly there is absolutely nothing wrong with!), there is still lots of fun complexity left in the stack.
OpenROAD comes to mind, as the main open source self optimizing RTL-to-GDS effort. Begat in part with DARPA help. Lots of derivarive efforts/projects!
There was a ton of energy & hype & visibility for the first couple years after launch (2018). But it's been pretty quiet, a lot less PR since. I wish there was more visibility into the evolution of this & related effort. https://github.com/The-OpenROAD-Project/OpenROAD?tab=readme-...
There is plenty of updates on the website at https://theopenroadproject.org/ - Including a pretty good report on what happened in 2024.
Also take a look at the Open Source EDA BOF from the DAC conference - https://open-source-eda-birds-of-a-feather.github.io/
Interesting, I thought(from the title) this would be about analogue vs digital designers. But the article is written in the context of a "fully digital" chip (i.e the analogue stuff is abstracted away, all chips are analogue at the end of the day).
"Fullstack chip designers" exist in the mixed-signal world. Where the analogue component is the majority of the chip, and the digital is fairly simple, it's sometimes done by single person to save money. At least it was definitely a thing in the late 00's and early 2010's in small fabless design centers. Not sure about nowadays.
Full stack-ish, when you have an in house layout guy a few cubes over and are old enough to have done schematic captured ASIC (Cadence Edge!) and gate level emergency fixes. But alas, Catapult C is the new religion. Old dog, new tricks and all that.
Almost everyone in my team is "full stack" (nobody has ever called it this). I'm not convinced by this. I guess it would allow us to hire worse people, but I'm not sure that's a good thing to aim towards.
This is currently a huge source of inefficiency in modern chip design.
I've worked on some of the current highest profile chip projects doing "frontend" RTL design, and at every major chip company I've worked at, and from talking with coworkers about their past experiences at other companies, the handoff-wall between RTL and PD is leaving a substantial amount of perf per power/area on the table. (like 30% I'm general)
RTL designers generally have no visibility into how their designs are getting laid out, and generally don't want to have to care. PD engineers have no visibility into the uArch and low level code details, and maybe they want to care but everything is too obfuscated in general.
So when your pins are misplaced during an early iteration, RTL will blindly add retiming to resolve timing issues and make PD happy but never check if it's actually needed. PD will slave away trying to make busted RTL work with placement and recipe adjustments rather than asking RTL for a trivial fix, etc etc.
There are a ton of small things where visibility into either side of the process would result in measurably better hardware, but the current team structures and responsibility boundaries encourage people not to care.
That final 30% or whatever takes a lot longer to obtain than the first 70%. Big teams want to ship their chip tomorrow and it needs to work. They don't want any more risk than they're already saddled with so just leave it on the table for next time. I think what you're proposing with less siloing is obviously better (it's the only way I want to work) but it's going to come with a price. There is definitely room in the tooling to help with this, and it doesn't need to involve "AI".
Why is this comment dead? (Is it just the poster's alias?) This experience is common everywhere in larger organizations, and absolutely affects chip design.
Of course it has to be another article about "AI". It wouldn't be on HN if it's not about "AI". /s