XLS: Accelerated HW Synthesis

google.github.io

122 points by victor82 5 years ago · 54 comments

Reader

Traster 5 years ago

>XLS is used inside of Google for generating feed-forward pipelines from "building block" routines

For those that aren't familiar, control flow - or non "Directed Acyclical graphs" are the hard part of HLS. This looks like a fairly nice syntax compared to the bastardisations of C that Intel and Xilinx pursue for HLS but I'm not sure this is bringing anything new to the table.

As for the examples, I'm kind of flumoxed that they haven't given any details on what the examples synthesize to. For example, how many logic blocks does the CRC32 use? How many clock cycles? What about the throughput? I'm going to sound like a grumpy old man now, but it's important becaues it's very difficult to get performant code as a hardware engineer. Generally it involves having a fair idea of how the code is going to synthesize. What is damn near impossible is figuring out what you want to synthesize to, and then guessing the shibboleth that the compiler wants in order to produce that code. Given that they haven't tackled the difficult problems like control flow, folding, resource sharing etc. It makes me hesitant to believe they've produced something phenomenal.

learyg 5 years ago

Hi, one of the collaborators here, thanks for the good points.
We have been targeting some Lattice FPGAs for prototyping purposes, but we've mostly been doing designs for ASIC processes, which is why details are a little sparse for FPGAs you get off the shelf, but it's a priority for us to fill those in. We have some interactive demos that show FPGA synthesis stats (cell counts, generated Verilog, let you toy with the pipeline frequency) and integrate with the [IR visualizer](https://google.github.io/xls/ir_visualization/#screenshot), we'll try to open source that as soon as possible. The OSS tools (SymbiFlow) that some of our colleagues collaborate on can do synthesis in just a few seconds, so it can feel pretty cool to see these things in near-real-time.
We fold over resources in time with a sequential generator, but we still have a ways to go, we expect a bunch of problems will map nicely onto concurrent processes, they're turing complete and nice for the compiler to reason about.
I'm a big believer that phenomenonal is really effort and solving real-world pain points integrated over time -- it's a journey! We're intending to do blog posts as we hit big milestones, so keep an eye out!
- Traster 5 years ago
  
  Do you mind me asking what applications Google uses this for internally? Is this used in a flow that's ended up in production? Also, what are your thoughts on integrating optimized RTL blocks?
  - learyg 5 years ago
    
    One of the things we have on our short list is "good FFI" for instantiating existing RTL blocks (and making their timing characteristics known to the compiler) and making import flows from Verilog/SystemVerilog types. The latter may be a bit your-Verilog-flow specific, but we think there are some universal components you can provide that folks can slot in their flows as appropriate.
    Being able to re-time pipelines without a rewrite is a useful capability. Although it's still experimental and we're actively building out the capabilities, we have it in real designs that have important datapaths.
- person_of_color 5 years ago
  
  Are you hiring SWEs for HW-SW co-design?
  - learyg 5 years ago
    
    I am not personally a manager / hiring manager, but this is the job posting for SW/HW codesign positions in the south bay area CA -- speaking as an IC, it has been a very enjoyable area to work in as specialized designs become more important! https://g.co/kgs/xGSUXy
    
    person_of_color 5 years ago
    
    Thanks for the tip.
aseipp 5 years ago

The HLS tools from Xilinx and Intel (and maybe Cadence I guess) can also actually compile your models as ordinary C++ code (i++ from Intel is literally just a fork of Clang, I think, and so are tools like LegUp), leading to their greatest benefit: simulations are way, way faster and software compilers have vastly better iteration times than synthesizers.
They seem to have a simulation framework for these tools that isn't just "re-use an existing simulator", and it apparently does use LLVM for codegen but that's the easy part. Actual simulation performance numbers would be really interesting to see vs actual RTL sims.

Connect12A22 5 years ago

I love their RISC-V implementation in 500 lines of code: https://github.com/google/xls/blob/main/xls/examples/riscv_s...

Traster 5 years ago
It's kind of a good demonstration of the problem with software versus hardware, here's xls solution (just for one function):
```
  fn decode_i_instruction(ins: u32) -> (u12, u5, u3, u5, u7) {
   let imm_11_0 = (ins >> u32:20);
   let rs1 = (ins >> u32:15) & u32:0x1F;
   let funct3 = (ins >> u32:12) & u32:0x07;
   let rd = (ins >> u32:7) & u32:0x1F;
   let opcode = ins & u32:0x7F;
   (imm_11_0 as u12, rs1 as u5, funct3 as u3, rd as u5, opcode as u7)
  }
```
here's the systemverilog solution
```
  {im_11_0,rs1,funct3,rd,opcode} <= ins;
```
Obviously, in software, you can't slice data in the same way since as far as I can tell, it's assuming all variables are a certain size and so there's no naturally way of bit slicing.
- learyg 5 years ago
  
  Thanks again for the detailed thought! We actually [developed more advanced bit slicing syntax]( https://github.com/google/xls/blob/1b6859dc384fe8fa39fb901af... ) since that example was written, you can do things like a standard slice `x[5:8]` or a Verilog-style "width slice" that has explicit signedness `x[i +: u8]`. There's currently no facility for "destructuring" structs as bitfields like pattern matches, but there's no conceptual reason it can't be done, I think that'd be an interesting thing to prioritize if there's good bang for the buck. [Github issue to track!](https://github.com/google/xls/issues/131) Let me know if I missed out on details or rationale, thanks!
  - Traster 5 years ago
    
    Hey, thanks for replying, the project looks like it has a lot of potential. You're right, bit slicing gets you like 99% of the way there (the rest is just syntax sugar). It's interesting because from what I remember there were some non-trivial issues for the people using LLVM for their IR because of fundamental assumptions in the representation, but bit-slicing is the core functionality. Is there a reason you guys decided on your own IR?
- FullyFunctional 5 years ago
  
  That's untrue. You need to include the declarations of im_11_0, etc. for the above to work and then you end up with just as much code. There's no reason they couldn't extend match to operate on bit slices also which would make this identical.
  Frankly, combinatorics is not where I expect the most interesting differences. Sequential logic is surely more interesting.
fmakunbound 5 years ago

Comments indicate it implements a subset of various things.

jashmenn 5 years ago

I've been programming for 20 years and yet I have no idea what this does. Can someone ELI5?

jevogel 5 years ago

As far as I can tell, it is a high-level synthesis tool for developing FPGA/ASIC applications. You write your circuit functions in a Rust-like DSL and it generates optimized Verilog/System Verilog code, which can then be synthesized into hardware. But you can also take the output of the DSL and simulate it first, which presumably is quicker than simulating Verilog.
tlack 5 years ago

You feed in Rust (a flavor called DSLX) or C++ and it generates code for your FPGA (in Verilog). You then upload this compiled "bitstream" to your FPGA and now you have something akin to a custom microprocessor, but running just your program.
- est31 5 years ago
  
  It looks really quite similar to Rust: https://github.com/google/xls/blob/main/xls/examples/dslx_in...
  Note that there are differences though: Seems no type inferrence, for .. in, different array syntax, match arms delimitered by ";" instead of ",".
  But it has a lot of the cool stuff from Rust: pattern matching, expression orientedness (let ... = match { ... }), etc.
  Also other syntax is similar: fn foo() -> Type syntax, although something similar to that can be achieved in C++ as well.
  - muizelaar 5 years ago
    
    Looks like the match arm difference is going away: https://github.com/google/xls/pull/127
    
    est31 5 years ago
    
    Very cool. TBH, Rust's match arm delimiter story is a bit weird. Sometimes you need to put a ",", sometimes you don't. And macro rules macros have ";" instead of ",".
    
    couchand 5 years ago
    
    > Sometimes you need to put a ",", sometimes you don't
    The rule is pretty simple: if you have curly braces you don't need a comma (and rustfmt will drop it), if you don't have curly braces you need a comma.
cokernel_hacker 5 years ago

It is a project aimed at making the design of electronic logical easier.
Often, such hardware is written using hardware description languages [1] like Verilog or VHDL. These languages are very low level and, in the opinion of some, a little clumsy to use.
XLS aims to provide a system for High-level synthesis [2]. The benefit of such systems is that you can more easily map interesting algorithms to hardware without being super low level.
[1] https://en.wikipedia.org/wiki/Hardware_description_language
[2] https://en.wikipedia.org/wiki/High-level_synthesis
- pkaye 5 years ago
  
  I remember years ago reading about Handel-C. A lot like Go with channels and threads and function calls. The way it synthesized the hardware was pretty simple conceptually. You could easily understand how the program flow was converted into a state machine in the hardware.
  Not sure what happened it it. Maybe it did not optimize things enough.
  https://en.wikipedia.org/wiki/Handel-C
  https://babbage.cs.qc.cuny.edu/courses/cs345/Manuals/HandelC...
  - jlokier 5 years ago
    
    I worked on the Handel-C compiler :-) Then later used the language for a few years.
    Its approach was intentionally simple conceptually. You could tell at a glance how many synchronous clock cycles each step would take, and roughly what logic would be produced, so it worked quite well for deterministic I/O and simple logic.
    I found it a bit of a pain for high-throughput pipelining though, and personally prefer a compiler that has more freedom to auto-balance pipelines and retime logic.
    I think Handel-C occupied a middle ground between other HLS, and Verilog/VHDL. It had the concise, C-like syntax of the former, with the predictability of the latter.
    What happened to it was it transitioned from university to spin-out company Celoxica, and then was eventualy bought by Agility; then Mentor Graphics bought Handel-C while Agility folded, and Mentor seemed to mothball it.
    For a while in the middle there was a decent business with great customers and a decent market cap, and something I'm not privy to resulted in the business folding. I don't think it failed due to insufficient code optimisation :-)
erikerikson 5 years ago

Not like you're 5 and I'm definitely not an expert on this project but here's my best shot...
Most programs are loaded into memory and parts of those programs are moved to registers and are used to load data into other registers. That data is, in turn, sent to logic units like adders that add two registers together or comparators that compare to register's values. The generality comes at a cost in terms of power and time but offers flexibility in return.
That is very different from something like a light switch where you flip the switch and the result continuously reflects that input within the limits of the speed of light.
If you are willing to sacrifice flexibility, translating your code into hardware gives you a device that runs the same processing on its inputs continuously at the speed of light subject to your information processing constraints (e.g. derivations of the original input still need to be calculated prior to use).
Traditionally, separate languages and greater hardware knowledge requirements made custom circuits less accessible. This project brings more standard, higher level languages into the set of valid specifications for custom electronics.
foota 5 years ago

I think it turns a c-ish language (from the looks, not sure about semantics) into a hardware language like HDL.
zelly 5 years ago

Verilog for codemonkeys
- FullyFunctional 5 years ago
  
  That's a complete mischaracterization. The point of any and all HLSes is to raise the level of abstraction so you can be more productive. Even for highly skilled Verilog "monkies", writing in an HLS is a great deal faster and less error prone (assuming comparable mastery of the language) simply because you do not need to deal with a lot of low level details.
  The $1M question however how this experience pans out as you try to squeeze out the last bit of timing margin. I don't know, but I'm eager to find out.
  ADD: this parallels the situation with CUDA where writing a first working implementation is usually easy, but by the time you have an heavily optimized version ...
- nickysielicki 5 years ago
  
  HLS is going to improve, and you can either disregard it and be left behind or you can try to understand where it fits into a design. Your choice.
patrickcteng 5 years ago

ditto
- gadders 5 years ago
  
  Thank god I'm not alone.

mmastrac 5 years ago

I love this. I did something similar with using Java to build an RTL:

https://github.com/mmastrac/oblivious-cpu/blob/master/hidecp...

I was thinking about turning it into a full language at some point, but they beat me to it (and I love the Rust syntax!).

jeffreyrogers 5 years ago

This is interesting. Overall I'm bearish on high-level synthesis for anything requiring high performance, since you typically need to think about how your code will be mapped to hardware if you want it to perform well, and adding abstractions interferes with that. I would like to know more about how Google uses this, since it doesn't seem like a good fit for the type of stuff I work on.

learyg 5 years ago

Hi, one of the collaborators here! One question to consider, and one that I consider pretty frequently, is what the hard difference really is between HLS and RTL. It seems up to interpretation, but I think of it more as a spectrum than anything that truly schisms the space. I think I personally associate the term HLS with "trying to uplevel the design process where we can".
Even with modern RTL, we have a synthesizing compiler optimizing our design within a cycle boundary, trying to manage fanouts and close timing by duplicating paths and optimize redundant boolean formulas. Some will even do some forms of cross stage optimization.
If you think of XLS's starting point as "mostly structural" akin to RTL (instead of "loops where you push a button and produce a whole chip") it's really an up-leveling process, where there's a compiler layer underneath you that can assist you in exploring the design space, ideally more quickly and effectively, and trying to give you a flexible substrate to make that happen (by describing bits of functionality as much as possible in latency insensitive ways).
I like to think of it like [Advanced Chess](https://en.wikipedia.org/wiki/Advanced_chess) -- keep the human intuition but permit the use of lots of cycles for design process assist. It appears from what we've seen so far that when you have a "lifted" representation of your design such that tools can work with it well, composition and exploration becomes more possible, fun, and fruitful! I expect over time we'll have a mode where you still require everything closes timing in a single cycle when you explicitly want all the control you had / don't care so much for the assist, then you just get the benefits of the tooling / fast simulation infrastructure that works with the same program representation. It's a great space to be working in as somebody who loves compilers, tools, and systems: there's so much you could do, there's incredible opportunity!
typon 5 years ago

This doesn't seem like HLS, more like a new HDL that's based on Rust. This has been done many times before with other functional languages (Clash, Chisel, Spinal, hardcaml and others). These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.
I'm sure Google will use XLS for their internal digital design work, but I don't expect this to ever gain widespread support. (not because HLS is inherently bad, but because of the culture)
- Traster 5 years ago
  
  > These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.
  This is categorically not true. There have been repeated projects to re-invent hardware description languages. They don't fail because hardware engineers are conservative, they fail because they don't produce good enough results.
  Intel has a team of hundreds of engineers working on HLS, Xilinx probably has almost as many, there are lots of smaller companies working on their own things like Maxeler. They haven't take off because it's an unsolved problem to automate some of the things you do in Verilog efficiently.
  Take this language for example - it cannot express any control flow. It's feed forward only. Which essentially means, it is impossible to express most of the difficult parts of the problems people solve in hardware. I hate Verilog, I would love a better solution, but this language is like designing a software programming language that has no concept of run-time conditionals.
  - aseipp 5 years ago
    
    I mean, languages like Bluespec are very close to actual SystemVerilog semantically, and others like Clash are essentially structural by design, not behavioral (I can't speak for other alt-RTLs). You are in full control of using DFFs, the language perfectly reflects where combinatorial logic is done, the mappings of DFFs or IP to underlying RTL and device primitives can easily be done so there's no synthesis ambiguity, etc. In the hands of an experienced RTL engineer you can more or less exactly understand/infer their logic footprint just from reading the code, just like Verilog. You can do Verilog annotations that get persisted in the compiler output to help the synthesizer and all that stuff. Despite that, you still hear all the exact same complaints ("not good enough" because it used a few extra LUTs due to the synthesizer being needy, despite the fact RTL people already admit to spending stupid amounts of time on pleasing synthesizers already.) Died-in-the-wool RTL engineers are certainly a conservative bunch, and cagey about this stuff no matter what, it's undeniable.
    I think a bigger problem is things like tooling which is deeply invested in existing RTLs. High-end verification tools are more important than just the languages, but they're also very difficult to replicate and extend and acquire. That includes simulation, debuggers, formal tools, etc. Verification is where all the actual effort goes, anyway. You make that problem simpler, and you'll have a winner regardless of what anyone says.
    You mention the Intel and Xilinx's software groups, but frankly I believe it's a good example of the bigger culture/market problem in the FPGA world. FPGA companies desperately want to own every single part of the toolchain in a bid for vertical integration; in theory it seems nice, but it actually sucks. This is the root of why everyone says Quartus/Vivado are shitware, despite being technically impressive engineering feats. Intel PSG and Xilinx just aren't software companies, even if they employ a lot of programmers who are smart. They aren't going to be the ones to encourage or support alternative RTLs, deliver integrated tools for verification, etc. It also creates perverse incentives where they can fuel device sales through the software. (Xilinx IP uses too much space? Guess you gotta buy a bigger device!) Oh sure, Xilinx wants you to believe that they're uniquely capable of delivering P&R tools nobody else can — the way RTL engineers talk about the mythical P&R algorithms, you'd think Xilinx programmers were godly superhumans, or they were getting paid by Xilinx themselves — that revealing chip details would immediately mean their designs would be copied by Other Electronics Companies and they would crumble overnight despite the literal billions you would need up-front to establish profitability and a market position, and so on. The ASIC world figured out a long time ago that controlling the software just meant the software was substandard.
- jeffreyrogers 5 years ago
  
  They describe it as HLS, and it definitely looks like HLS to me. But maybe we have different definitions. Either way, it seems to be targeting a strange subset of problems: it doesn't look high level enough to be easy to use for non-hardware designers (I don't think this goal is achievable, but it is at least a worthy goal), and it doesn't seem low-level enough to allow predictable performance.
- gchadwick 5 years ago
  
  > These projects never take off because hardware designers are inherently conservative and they won't let go of their horrible language (Verilog or SystemVeriog) no matter what.
  As a hardware designer whose never been a fan of SystemVerilog but continues to use it I think this is inaccurate. There are two main issues that mean I currently choose SystemVerilog (though would certainly be happy to replace it).
  1. Tooling, Verilog or SystemVerilog (at least bits of it) is widely supported across the EDA ecosystem. Any new HDL thus needs to compile down to Verilog to be usable for anything serious. Most do indeed do this but there can be a major issue with mapping the language. Any issues you get in the compiled Verilog need to be mentally mapped back to the initial language. Depending upon the HDL this can be rather hard, especially if there's serious name mangling going on.
  2. New HDLs don't seem to optimize for the kinds of issues I have and may make dealing with the issues I do have worse. Most of my career I've been working on CPUs and GPUs. Implementation results matter (so power, max frequency and silicon area) and to hit the targets you want to hit you often need to do some slightly crazy stuff. You also need a very good mental model of how the implemented design (i.e. what gates you get, where they get placed and how they're connected) is produced from the HDL and in turn know how to alter the HDL to get a better result in gates. A typical example is dealing with timing paths, you may need to knock a few gates off a path to meet a frequency goal which requires you to a) map the gates back to HDL constructs so you can see what bit of RTL is causing the issues and b) do some of the slightly crazy stuff, hyper-specific optimisations that rely on a deep understanding of the micro-architecture.
  New HDLs often have nice thing like decent type systems and generative capabilities but loose the low-level easy metal mapping of RTL to gates you get with Verilog. I don't find much of my time for instance is spent dealing with Verilog's awful type system (including the time spent dealing with bugs that arise from it). It's frustrating but making it better wouldn't have a transformative effect on my work.
  I do spend lots of time mentally mapping gates back to RTL to then try and out work out better ways to write the RTL to improve implementation results. This often comes back to say seeing an input an AND gate is very late, realising you can make a another version of that signal that won't break functional correctness 90% of the time with a fix-up applied to deal with the other 10% of cases in some other less timing critical part of the design (e.g. in a CPU pipeline the fix-up would be causing a reply or killing an instruction further down the pipeline). Due to the mapping issue I brought up in 1. new HDLs often make this harder. Taking a higher level approach to the design can also make such fixes very fiddly or impossible to do without hacking up the design in a major way.
  That said my only major experience with a CPU design not using Verilog/SystemVerilog was building a couple of CPUs for my PhD in Bluespec SystemVerilog. I kind of liked the language but ultimately due to 1. and 2. didn't think it really did much for me over SystemVerilog.
  If you're building hardware with less tight constraints than yes some of the new HDLs around could work very well for you and yes hardware designers can be very conservative about changing their ways but it simply isn't the case that this is the only thing holding back adoption of new HDLs.
  I do need to spend some more time getting to grips with what's now available and up and coming but I can't say I've seen anything, that for my job at least, provides a major jump over SystemVerilog.
- analognoise 5 years ago
  
  Hardware has gotten 1000x faster, and software has made that 1000x faster system slower than it was in the 1980's, and you think hardware people should learn the software style?
  ...Are you sure?

thotypous 5 years ago

Google is also investing some developer time on Bluespec since it was opensourced (https://github.com/B-Lang-org/bsc). I wonder if these projects make part of a bigger plan at Google.

rbanffy 5 years ago

When I started playing with MAME, I somewhat dreamed of a way to turn its highly structured code into something that could not only be compiled into an emulator as it is, but also be synthesizable into hardware.

The possibility of using a single codebase to generate both a software emulator and a hardware implementation is incredible, from a hardware preservation point of view.

asdfman123 5 years ago

If they rename it XLSM they can embed some neat VBA scripts into it and squeeze out more functionality.

(I'm sorry.)

w_t_payne 5 years ago

I've got a Kahn-process-network based "simulation" framework, intended to provide a smooth conveyor belt of product maturation from prototypes written in high level scripting languages like Python or MATLAB through to production code written in C or Ada. (Sort of like Simulink, but with a different set of warts). Having some hardware synthesis capability is very much on the roadmap, and this looks like it's going to be worth investigating for that. Very excited to dive into it!

ampdepolymerase 5 years ago

Reminds me of the old reconfigure.io which used the ideas and syntax of Go's CSP and transformed them into async HDL code. Unfortunately the startup has been shuttered.

http://docs.reconfigure.io/

navidr 5 years ago

What happened to them?
- ampdepolymerase 5 years ago
  
  They shut down. Here's the founder: https://twitter.com/robtaylor78
  - navidr 5 years ago
    
    Thanks. Do you know the reason?

simonw 5 years ago

XLS as an acronym for Accelerated HW Synthesis is a bit of a stretch!

high_derivative 5 years ago

It's most likely inspired by XLA (Accelerated Linear Algebra) - same creator(s).
dirtypersian 5 years ago

I believe it might come from the fact that this process of going from high level programming language to hardware is called "high level synthesis". I think the "X" is meant to make it more generic, i.e. X level synthesis.
- simonw 5 years ago
  
  That makes sense. Accelerated => XL just about works for me.

rowanG077 5 years ago

DSLX seems like a nightmare. Does it support arbitrary C++?

R0b0t1 5 years ago

Settings

XLS: Accelerated HW Synthesis

Keyboard Shortcuts