RISC-V Scalar Cryptography Extension reaches public review
github.comThis extension is great. If anyone is interested, my roommate and I partially implemented the previous revision of it on a RISC-V GPU called Vortex: https://carrv.github.io/2021/papers/CARRV2021_paper_87_Adams...
I'm excited to see that the RV32 AES instructions now have separate rs1 and rd fields, because the previous version combined them into just rt, which was kind of annoying from an implementation perspective, since you had some register that was both input and output (iirc, unlike any other RV32 instruction previously implemented on that hardware)
It is correct to call Vortex a GPU? I looked at the github but there doesn't seem to be anything there related to graphics.
It appears to target "GPGPU" APIs like OpenCL and CUDA via translation, making it more like Xeon Phi in that its architecture and instruction set is vaguely GPU-like, while it lacks video output or fixed-function pipeline hardware for graphics stuff.
Also like Xeon Phi, I'm guessing you could add that hardware, it just hasn't been done because the other priorities were more interesting, and the practical performance would be "disappointing".
A quick summary of what is included for those who don't feel like reading the spec:
- Some miscellaneous bit twiddling instructions (rotate, permute, pack, ...) useful in various crypto schemes.
- AES.
- SHA2 (-256 and -512).
- SM3 and SM4.
- Physical entropy source (with some variants to accommodate low profile variants)
The SM3/4 were unfamiliar to me - apparently it is a hash function & block cipher used in Chinese WiFi variant. Should I just assume this is backdoored?> for those who don't feel like reading the spec:
I'm biased, but the spec is supposed to be very accessible to people without a cryptography background. There's a section on who the intended audience is and what assumptions are made about their background. I'd really recommend it.
> The SM3/4 were unfamiliar to me - apparently it is a hash function & block cipher used in Chinese WiFi variant.
SM3/4 are required for use in certain places in China. RISC-V is popular in China, hence their inclusion in the RISC-V spec. My expectation is that SM3/4 will not likely ever be adopted outside China.
> Physical entropy source (with some variants to accommodate low profile variants)
There are no "variants" of the entropy source. There is one entropy source interface definition which is designed to scale across the many RISC-V implementation profiles. It's very different to x86/RDRAND which lots of people are used to.
> SM3/4 are required for use in certain places in China. RISC-V is popular in China, hence their inclusion in the RISC-V spec.
That sounds like a pretty poor reason.
China could create the RISC-V SCE-China spec that extends RISC-V SCE with these, and call it a day, instead of requiring the rest of the world to waste transistors for something that's useless.
The algorithm specific instructions are all optional. You can have AES without SM4 or vice versa. RISC-V is great like that, it's designed to be modular.
> instead of requiring the rest of the world to waste transistors for something that's useless.
I'm sure Chinese manufacturers might feel the same about NIST standards.
> I'm sure Chinese manufacturers might feel the same about NIST standards.
Don't count on it. For example have you ever wondered why there isn't a Russian Certificate Authority trusted in the Web PKI? There's no market for one. If you're a Russian, you can see that a Russian CA is obviously subject to control by Putin, which even if you like Putin today doesn't seem like a perpetually great idea, so you would choose some European CA instead. And if you're not a Russian you clearly don't want to trust this CA.
Now, there are some Chinese CAs, but it's again interesting that they're not popular in China. China has a huge population, plenty of potential customers, but somehow even though there is more than one CA in China, very few certificates between them. Similar to the number issued to the Government of Spain (not all companies in Spain, just their government). Same reasoning. Even if I think Xi Jinping is great and I'm a proud Chinese national, a certificate from the US or Switzerland seems like a better choice.
The Americans fall far below the lofty moral standards they set for others [in the other room is my redacted copy of the Committee Study of the Central Intelligence Agency's Detention and Interrogation Program, grim reading about American torture even though much of what the senate were shown is redacted], but only at your considerable peril should you would mistake that for meaning their cryptography is no better than whatever home grown offering has been chosen in your country despite their billions spent and their expertise in this domain.
> For example have you ever wondered why there isn't a Russian Certificate Authority trusted in the Web PKI? There's no market for one.
A more direct comparison would be Russian ciphers and there absolutely are modern Russian ciphers, e.g. https://en.wikipedia.org/wiki/Kuznyechik
Nobody uses those, either, except possibly as required to interact with the cursed government PKI (about as cursed as early 00s EU government PKIs... are those still around?). Also maybe the government people with clearances, but the less said about them the better. But that’s mostly network effects, frankly, not trust. (Nobody uses Camellia, either.) Trust issues as described by the GP do exist but mostly factor into choosing domain names, registrars, hosting, and such.
But China, unlike Russia, does have an internal technological environment meaningfully separate from the world at large. It may also be trying to cultivate an ecosystem of private government contractors, which the intense criminality of Russian government procurement doesn’t permit. (China also has a general-purpose IC fabrication industry worth a damn, whereas for Russia the equivalent question is in any case largely moot.)
My quick summary of sm3/sm4 is: - sm3 is pretty trivial to implement - sm4 is about 1/16 the complexity of the spec's aes implementation (one box lookup per clock rather than 8 and no inverted version)
So if you want to court the (giant) Chinese market it's kind of a no brainer
> I'm biased, but the spec is supposed to be very accessible to people without a cryptography background. There's a section on who the intended audience is and what assumptions are made about their background. I'd really recommend it.
Certainly! As you can probably tell from my comment I'm not expert and I found it easy to follow.
I just wanted to post a summary for anyone who is interested but doesn't find time to go into details. I know that I myself often read this site on phone and I appreciate similar comments giving a tl;dr on more complex stories.
> There are no "variants" of the entropy source. There is one entropy source interface definition which is designed to scale across the many RISC-V implementation profiles. It's very different to x86/RDRAND which lots of people are used to.
Maybe I phrased it poorly but section "4.2. Entropy Source Requirements" states: "An implementation of the entropy source should meet at least one of the following requirements sets in order to be considered a secure and safe design". It then gives three options, one of which ("4.2.3 Virtual Sources: Security Requirement") states "A virtual source is not a physical entropy source" and "A virtual source traps access to the seed CSR, emulates it, or otherwise implements it without direct access to a physical entropy source.".
My interpretation is that there is indeed a single interface (CSR) however the hardware implementation could be both real physical entropy source or a CSPRNG. And presumably the latter is more likely on low-end devices.
Please let me know if I'm getting this wrong.
> My interpretation is that there is indeed a single interface (CSR) however the hardware implementation could be both real physical entropy source or a CSPRNG. And presumably the latter is more likely on low-end devices.
A CSPRNG doesn't do anything without a seed. If you're actually a VM, your host provides the seed (the "virtual source"), which it chose randomly, and since it is actually your host anyway it has no particular reason to give you a bad seed versus just doing whatever else to sabotage you, so you have to assume the seed is good.
In contrast on physical hardware, there is no seed. If you've got a way to provision genuinely random data to the physical CPU, you don't have a "virtual source" at all. So option 4.2.3 isn't relevant to physical CPUs only to a RISC-V VM.
Pretty much every instruction that doesn't start with the name of some known crypto algorithm (and maybe some that do) are useful for general-purpose stuff. I've had a good deal of success making Intel's GFNI do "weird off-label things" (bit-matrix transpose and a lot of the missing byte shift/rotate operations just scratches the surface). CLMUL is a good one for all sorts of things, as it can be used for XOR-parallel-prefix (we used it to detect quote pairs in simdjson).
I don't know whether I resent crypto because it gets the cool instructions at low latency because it's so important, or whether I love it due to the fact that even the "leavings at the crypto table" are computationally useful.
Stockholm syndrome, CPU makers have been needlessly starving us of highly useful bit manipulation and finite field operations ... so you're very thankful for what you get. :)
Base RISCV even lacks CLZ/popcount -- so it's a step backward from other popular architectures.
That said the stuff in bitmanip and crypto looks pretty good, so if they actually end up in chips that will be nice. On the other hand it's not clear how much awesome code will get written really exploiting them when they aren't everywhere-- which is probably why we're in this situation to begin with.
Maybe you're right, although 'Stockholm syndrome' feels a bit strong.
Agreed that stuff like CLZ/popcount/rotate etc should not be tucked away in a weird extension. I'm a bit alarmed at RISC-V's tendency to fragment into a gazillion subsets (the argument is that it's better to at least define the subsets rather than have a nightmare of incompatible and overlapping extensions, but still).
What is the "2-read-1-write register access constraint" mentioned in the introduction?
It means that each instruction reads no more than two general purpose registers (i.e. inputs), and writes at most one. When you build CPUs, register files are expensive components, and the more parallel accesses to them you need, the more expensive they become. RISC architectures generally rely on only reading two operands and writing only one result. Sometimes this rule is broken, but RISC-V tries to stick to it unless there's an extremely good reason.