Finding the average of two unsigned integers without overflow
devblogs.microsoft.comHaving done computer architecture and bit twiddling x86 in the ye olden days, I immediately, independently converged on the patented solution (code / circuit / Verilog, more or less the same thing). It goes to show how broken the USPTO is because it's obvious to anyone in the field. Patents are supposed to be nonobvious. (35 USC 103)
Agreed. I spent about a minute before reading it and came up with the first solution, didn't feel like thinking through the puzzle of how not to care which one is larger, and then settled on the one with the 2016 expiration date. All within 1 to 2 minutes. I briefly considered XOR but didnt feel like remembering more about it - the solution was obvious when I saw it. How any of that was ever patentable is a crime.
This is bizarre. I wonder how many of us saw that title, thought "That's a really simple problem, surely?" came up with a solution and then were shocked when their coffee-lacking brain actually came up with the patented solution?
I mean... ignoring the bitwise arithmentic (which this only obvious to people used to doing binary operations) this is the kind of maths that an 11yo could do.
That said, the patented solution is a little more complex. But not by much.
Which makes me curious: what other patents have we violated in our day-to-day without even knowing it?
> Which makes me curious: what other patents have we violated in our day-to-day without even knowing it?
Patents are like the criminal code - always remember "Three Felonies a Day" [1]. The system is set up so that if you are one of the 99%, the 1% can come in and bust you at will if you become too much of an annoyance/threat. They will find something if they just keep digging deep enough (not to mention that they can have your entire company's activity combed through with a microscope if they find a sympathetic court), and blast you with enough charges and threaten sequential jail time so that you cannot reasonably do anything other than plead guilty and forfeit your right to a fair trial [2].
And for what it's worth, that "play by the rules as we want or we will destroy you" tactic can even hit multi-billion dollar companies like Epic Games. It's one thing if society decides to regulate business practices by the democratic process of lawmaking... but the fact that Apple can get away banning perfectly legal activities such as adult content, vaping [3] or using a non-Apple payment processor from hundreds of millions of people is just insane, not to mention incredibly damaging to the concept of democracy.
[1]: https://kottke.org/13/06/you-commit-three-felonies-a-day
[2]: https://innocenceproject.org/guilty-pleas-on-the-rise-crimin...
[3]: https://www.macrumors.com/2020/06/01/pax-vape-management-web...
Freedom means some people will do stuff you don’t like.
It also means you are not necessarily entitled to rents for your “discoveries”.
And democracy means, at its core, to restrict the freedom of people committing actions that society has decided to be unlawful.
Except society hasn’t decided that such action is unlawful even if some people, such as yourself, wish it was. I am sure at least a few vegans would like to outlaw eating meat, but democracy also works the other way ensuring such extremely unpopular laws never get passed.
> I am sure at least a few vegans would like to outlaw eating meat, but democracy also works the other way ensuring such extremely unpopular laws never get passed.
I would not be so certain about that one.
Many countries are thinking of outright "meat taxes", health scores (similar to smoking warnings in the desired nudging effect) or extending CO² taxes onto agriculture: meat production causes about 14% of global CO² emissions [1], and outlawing/disincentivizing meat consumption is a very easy, very fast and incredibly effective way of cutting down on CO², methane and dung emissions. Not to mention the indirect emissions from land burning (especially in Brazil) or the societal cost of overconsumption of meat (e.g. obesity and heart issues).
Personally, I'm in the "omnivore" camp but recognize that the way how we deal with meat products has to be massively reformed. We need to cut waste and curb consumption, the sooner the better.
[1]: https://www.theguardian.com/environment/2021/sep/13/meat-gre...
That’s the more popular view, but there is still a group of hard core “meat is murder” vegans. Personally I would say get rid of all farm subsidies, but politics makes strange bedfellows and democracy means accepting compromises up to a point.
This will for now on be my preferred quote against “freedom at all cost” arguments.
We should end legal protection of ideas as soon as possible.
I just want to second this with my own experience just now:
I looked at the title while still waking up.
At first I thought of the low + (high - low) / 2 method. I then figured maybe it was better to simply predivide both numbers before adding and just correcting for the lowest bit (how was that ever patented?!).
However, I didn't like having to perform two divisions so I thought there was probably something clever one could do with bit operations to avoid it. But, still being tired, I decided I didn't want to actually spend time thinking on the problem and I'd already spent a minute on it.
x / 2 === x >> 1, it's fast.
For unsigned or positive x. Yes, the article is about unsigned integers, but some might see this for the first time and not be aware of this restriction. -3 / 2 == -1 but -3 >> 1 == -2.
Interesting thing is that `(a >> 1) + (b >> 1) + (a & b & 1)` works correctly for signed integers (if `>>` works like in Java, filling most significant bit with ones for negative numbers). With division you'll need to write different expressions depending on operand signs. E.g. (-3) / 2 + (-5) / 2 + ((-3) & (-5) & 1) = (-1) + (-2) + 1 = -2. But ((-3) >> 1) + ((-5) >> 1) + ((-3) & (-5) & 1) = (-2) + (-3) + 1 = -4.
You can use the appropriate right shift with signed integers as easy as with unsigned integers, you just have to handle in the right way the correction due to the bit shifted out.
The fact that the right shift for a negative integer gives the floor function of the result just makes the correction easier than if you had used division with truncation towards zero.
The shifted out bit is always positive, regardless whether the shift had been applied to negative or positive numbers.
Except for following a tradition generated by a random initial choice, programming would have been in many cases easier if the convention for the division of signed numbers would have been to always generate positive remainders, instead of generating remainders with the same sign as the quotient.
With positive remainders you get wired quotient behavior. Why should 10/3 and -10/-3 yield different results? Besides that, the choice is not universal, different languages use different conventions.
Why should 10/3 and -10/-3 yield the same result?
I do not see where this would be of any use.
On the other hand, if you want a quotient that has some meaningful relationship with the ratio between the dividend and the divisor, there are other more sensible definitions of the integer division than the one used in modern programming languages.
You can have either a result that is a floating point number even for integer dividend and divisor, like in Algol, or you can define the division to yield the quotient rounded to even (i.e. with a remainder that does not exceed half of the divisor).
In both cases 10/3 and -10/-3 would yield the same result and I can imagine cases when that would be useful.
For the current definition of the integer division, I do not care whether 10/3 and -10/-3 yield the same result. It does not simplify any algorithm that I am aware of, while having a remainder of a known sign simplifies some problems by eliminating some tests for sign.
I was not really thinking about application but the mathematics. It seems a reasonable decision to me that |a / b| = |a| / |b| and to not get results of different magnitude depending on sign changes only.
It's fast, but I figured doing that on both sides before adding looked a bit inelegant and maybe it could be avoided by doing "something something bit operations" and then I dropped the thought and clicked the link.
On a modern architecture given that most integers are usually u32 by default but the underlying CPU deals with 64bits natively, I'd just cast to u64 and call it a day.
Actually I was curious to see if GCC would be smart enough to automatically choose what's the best optimization depending on the underlying architecture, but it doesn't appear to be the case.
For x86_64 (with -O3 or -Os):
Clearly just casting to 64bits seems to denser codeavg_64bits: .LFB0: .cfi_startproc movl %edi, %edi movl %esi, %esi leaq (%rdi,%rsi), %rax shrq %rax ret .cfi_endproc avg_patented_do_not_steal: .LFB1: .cfi_startproc movl %edi, %eax movl %esi, %edx andl %esi, %edi shrl %eax shrl %edx andl $1, %edi addl %edx, %eax addl %edi, %eax retFor ARM32 (-O3 and -Os):
A lot more register spilling in the 64bit version since it decides to do a true 64bit add using two registers and an adc.avg_64bits: push {fp, lr} movs r3, #0 adds fp, r1, r0 adc ip, r3, #0 mov r0, fp mov r1, ip movs r1, r1, lsr #1 mov r0, r0, rrx pop {fp, pc} avg_patented_do_not_steal: and r3, r1, #1 ands r3, r3, r0 add r0, r3, r0, lsr #1 add r0, r0, r1, lsr #1 bx lrMy code, for reference:
uint32_t avg_64bits(uint32_t a, uint32_t b) { uint64_t la = a; uint64_t lb = b; return (la + lb) / 2; } uint32_t avg_patented_do_not_steal(uint32_t a, uint32_t b) { return (a / 2) + (b / 2) + (a & b & 1); }
> Patents are supposed to be nonobvious
Emphasis on supposed.
The granted patents include: laser used to exercise cat, and mobile wood based dog game (log used to play fetch).
https://abovethelaw.com/2017/10/8-of-my-favorite-stupid-pate...
https://patents.google.com/patent/US5443036A/en
https://patents.google.com/patent/US6360693
Apple steals the cake though. By patenting a geometric shape.
I bet you broke this patent as a kid https://patents.google.com/patent/US6368227B1/en
100% correct
the patented solution immediately came to mind
It almost did for me. I thought that you should be able to divide each number by 2 (or shift one bit) before adding, but that would lose a 1 if both numbers have 1 in their least significant bit. The part with "a & b & 1" fixes that exact issue and is obvious to me in hindsight.
> and is obvious to me in hindsight.
Everything is. That's kinda hindsight's thing.
Not so say that a few people in this thread probably saw this solution right away, but the "this was all obvious" crowd in this thread is a little too large for my taste. Be real, guys.
I guess the part about overflow in the title primes most experienced developers to immediately think about a solution where the added numbers are restricted beforehand to avoid the overflow. From there the obvious answer is to halve them, which leaves the next problem when the numbers are odd.
If you're not aware that numbers can overflow (and you probably don't tend to think about that for every single + you type, I guess), then the proper solution is less obvious.
And what is the intention to make the patent? The second way is actually more useful, not limited to unsigned ints.
But it requires you to know which one is larger. The patented way is faster if you are working with unsigned.
The patent is more sophisticated than what the article implies - it's a single clock cycle method, which no compiler I've ever seen will do given the code presented in the article.
And it's from 1996.
This thread is full of people who challenged themselves to solve it and then failed to come up with the 'obvious' 1-cycle solution. It's clearly non-obvious, as this thread shows.
The actual patent system failure here is the patent is not useful -- it's not valuable. If you needed this solution, you could sit down and derive it in less than an hour. That's not because it's obvious, but because the scope is so small.
The only difference between this patent and say a media codec is how long it would take to reinvent it. It might take you 200 years to come up with something as good as h.265, but there's no magic to it. There's a problem, somebody came up with a solution, somebody else could do it again given enough time to work on it. This is true for everything that's ever been patented.
The point of patents is to compensate for value of the work needed to reinvent, and so the real problem here is that value is less than any sane minimum. The value is less than the patent examiner's time to evaluate it! But court rulings have said it doesn't matter how insignificant a patent is, as long as it does anything at all it's "useful", which leads to these kinds of worthless patents.
and then failed to come up with the 'obvious' 1-cycle solution
That's unfair, as the commenters here are providing a software solution. The patent is about a hardware solution which involves two parallel adder circuits. It implements in hardware exactly what the software solution does, but you can't express it in software because there is no operand that expresses "implement this addition twice please". You'd have to express it as:
Which isn't 1-cycle either without the specialized adder.avg = [x>>1 + y>>1, x>>1 + y>>1 + 1][x&y&1]And I just realized the hardware solution is incredibly sub-optimal. If you were to design this in specialized hardware, you'd use a single (N+1)-bit adder and just discard the least significant bit in the output, not duplicate the entire adder tree in silicon.
There is no need for a specialized adder.
The patented expression is computable in an obvious way by a single ordinary adder and a single AND gate connected to the carry input of the adder, without any other devices (the shifts and the "& 1" are done by appropriate connections).
Any ordinary N-bit adder computes the sum of 3 input operands, 2 which are N-bit, and a third which is an 1-bit carry.
> This thread is full of people who challenged themselves to solve it and then failed to come up with the 'obvious' 1-cycle solution. It's clearly non-obvious, as this thread shows.
If a significant fraction of people come up with it on the spot, it's obvious. And they did.
I don't even see a single comment mentioning doing this in 1 cycle except from those who read the patent, much less reusing existing functional units to do so, so it's not clear to me any commenter came up with an equivalent to the patented solution or even identified the problem solved by it.
Keep in mind this solution was to support MPEG-1 video encoding in the olden days when state of the art processors were 100 MHz and 800 nm process. Doing this in 1 cycle while reusing already existing function units seems like a clever solution to me -- not patent-worthy, not difficult, but clever.
Are you very sure that patent would never get threatened toward a software implementation that doesn't know anything about cycles?
If so then the technique in the post isn't actually patented.
If that C code would get threatened, then the 1 cycle thing is a red herring.
Also "Doing this in 1 cycle while reusing already existing function units"? In hardware you can use a normal adder without any special technique...
I've already said twice now that it's not patent-worthy, so it seems we're in agreement on that point.
That's a response to you calling it clever.
Sorry, but this argument about the single-cycle implementation is complete BS.
Any logic designer, who is not completely incompetent, when seeing the expression
(a / 2) + (b / 2) + (a & b & 1);
will notice that this is a 1-cycle operation, because it is just an ordinary single addition.
In hardware the divisions are made just by connecting the bits of the operands in the right places. Likewise, the "& 1" is done by connecting the LSB's of the operands to a single AND gate and the resulting bit is connected to the carry of the adder, so no extra hardware devices beyond a single adder are needed. This is really absolutely trivial for any logic designer.
The questions at any hiring interview, even for beginners, would be much more complex than how to implement this expression.
It is absolutely certain that such a patent should have never been granted, because both the formula and its implementation are obvious for any professional in the field.
What is obvious today might not have been obvious in 1996.
Our experiences and training has changed dramatically over the past 26 years.
I can assure you I would have come up with the patented solution just as fast in 1996 when I was a teenager and dabbled in 6502 assembler on Atari computer. Because I solved it now on the basis of exactly the expeirience and knowledge I acquired back then.
But in as it is non-obvious, as to why it is non-obvious, criteria met.
> It goes to show how broken the USPTO is...
The patent issued in 1996 and wasn't revisited since then (because never asserted in litigation). The USPTO is a lot different now, a quarter-century later.
> The USPTO is a lot different now, a quarter-century later.
Please be more specific or link something that explains how they've improved.
Back then you couldn't early challenge a patent and prevent it from being issued, and once it was issued you couldn't challenge it without violating it and entering trial. Now you can do both.
Early challenging sounds helpful, but that's also outsourcing the work and it could put more coders at risk of treble damages further down the line from some patent they glanced at and forgot about.
Isn't there also a recourse process by which you can get a patent invalidated? You can't expect USPTO to hire an expert in every single possible field.
I think that's usually resolved in court. By which I mean, I don't think there's process beyond choosing to fight any suit brought against you and hoping you win in court.
> I don't think there's process beyond choosing to fight any suit brought against you and hoping you win in court.
Not true. See https://en.wikipedia.org/wiki/Reexamination It's even easier today than a decade ago, though the Wikipedia article doesn't explain that aspect very well. (I wouldn't be able to explain it, either. I think it has to do with reduced ability for a patent owner to drag out review, including dragging it into court.) Probably not nearly easy enough, though.
Why not?
The "SWAR" approach `(a & b) + (a ^ b) / 2` looks bizarre but can be understood.
Adding two bits produces a sum and a carry:
0 + 0 = 0, carry 0
1 + 0 = 1, carry 0
0 + 1 = 1, carry 0
1 + 1 = 0, carry 1
So the sum is XOR, and the carry is bitwise AND. We can rewrite x + y as (x ^ y) + (x & y)*2Distribute the divide, and you get (x ^ y)/2 + (x & y) which is the mystery expression.
(Note this distribution is safe only because (x & y)*2 is even.)
Alternatively, a direct explanation:
a & b is the bits they have in common. If they both have a bit x, you should keep it, because the average of x and x is x.
a ^ b is the bits that only one of them have. You should halve these bits, because the average of x and 0 is x/2.
This explanation makes a ton more sense!
But it has lots of assumptions/prerequisites about taking the mean baked-in. It happens to work for this particular case, but in general you don't get far with such hand-wavy reasoning since you just get confused with which assumptions hold and which don't.
The original explanation was the actual SIMD approach, which is really cool. You can extend it right away to other problems.
The explanation seems to work even in base 10.
Let's do the average of 85 and 95. The leftmost digit of both is equal, so we leave it: X5 The "xor"/2 is the sum of the non equal digits divided by 2 (respecting their powers): (8+9)10 = (17)10 = 85 Now, we add them: 5 + 85 = 90 (which is the average of 85 and 95).
Let's take now 87 and 89: The rightmost digits are equal: 8X We do the 'xor'/2 for the left most: (7+9)/2 = 8 8X + 8 = 88
Let me know if there are some examples for which it would fail (I only spent a minute to test if the explanation extends to other bases).
The average of 15 and 30 is not 22. The average of 51 and 03 is not 22.
It's a half-adder in software!
Thanks for this explanation.
What sets this post apart from the rest is that it actually provides useful information for everybody, after all those other posts of type "but that's obvious, I looked at it for 1 minute, saw half the solution, was too lazy for the rest and in hindsight it's all obvious. What a ridiculous patent system."
Not to say that the patent system is problematic...
Sorry i'm confused and i asked elsewhere.
How is the above SWAR? It looks like a regular set of instructions.
If you, for example, want to do addition of four 8-bit integers within a 32-bit register, you have to use similar techniques to stop the carry from propagating.
For example, when x and y are 32-bit integers holding 4 8-bit integers, you can do
Now z holds four 8-bit integers which hold the sum (modulo 256) of the integers of x and y. The bit mask is to stop the carry from propagating.z = (x ^ y) + (x & y) & 0x7f7f7f7f;This doesn't work because you're not left-shifting (doubling) the carry. But when adding the shifted carry to (x ^ y) we're back to potentially overflowing the highest bits. The solution is to add the highest and the lower bits separately:
Note this only improves performance for larger container integers.lower = 0x7f7f7f7f; highest = ~lower; z = ((x & lower) + (y & lower)) ^ ((x ^ y) & highest);You're completely right! Typically you don't care about overflow though (and you should use unsigned ints to avoid undefined behavior).
That's pretty neat. Is it actually any faster than just doing four 8-bit adds, though?
Presumably it would take 4 logical instructions to do the vectored math, vs. 4 logical instructions to do the scalar additions.
I suppose you're looking at a minimum of two registers for the vectored approach, vs. 8 for the scalar approach. Having the result available in separate registers makes them immediately available for use, though.
There's also the overhead of getting the numbers in and out of memory. Loading and storing one word is obviously going to be way better than loading 4 bytes individually.
It seems to me like the vectored approach would be better for algorithms that require iterating through a large dataset in memory. The scalar approach would be better for algorithms that have a bunch of dependent calculations. Perhaps that's an obvious conclusion!
That's pretty neat though. For the large dataset scenario, perhaps you could get a significant speedup on relatively simple architectures such as cortex-m microcontrollers. I suspect that sufficiently modern high end CPUs/compilers wouldn't benefit so much from it, though? All the pipelining, superscaling and caching and whatnot could sufficiently mask the latencies of the memory accesses to the point of being irrelevant. Also, a sufficiently clever compiler could implement the loop with actual SIMD instructions and achieve significantly higher performance than the manual in-register optimization.
This would be a fun way to compute a basic 8-bit checksum on a binary blob in a microcontroller... Not that it would be practically useful because any non-trivially sized blob would be better served with at least a Fletcher checksum if not a full CRC, both of which seemingly lack the necessary associativity.
> That's pretty neat. Is it actually any faster than just doing four 8-bit adds, though?
The operation itself is not necessarily faster (it could be faster due to pipelining, I think, and it would probably ne faster when storing 8 8-bit ints in a 64-bit int), but it can save the hassle and runtime of packing and unpacking into four variables.
This technique was useful on a 68000 in a 4 voice software PCM sampled instrument music player running on interrupts on the Atari ST back in the day.
Found the game where I worked. Default hardware bleeps and bloops in first version https://www.youtube.com/watch?v=RGOdHT29Jpc
Four voice digi player using SWAR in second version https://www.youtube.com/watch?v=1GrdvcghDXE
Very cool! The difference in audio fidelity is quite impressive.
Reminds me of a weekend hobby project I did back in 2014 or so. I had an itch to play with analog video signal generation from an atmega328p. Rather than use one of the existing libraries, though, I started from scratch with the goal of achieving the highest possible resolution. I used the SPI peripheral to clock out 8 pixels at a time at 8Mhz without any gaps, giving me something like 12 instructions to prepare the next byte. There wasn't enough RAM for a frame buffer at the resolution, so I instead used character tiles; that ate up the whole budget. I forget what the resolution was, but it was significantly higher than the existing library was capable of. There was a jitter, which I tracked down the the variability in interrupt latency due to the AVR having variable cycle length instructions. I was using a timer interrupt to schedule the start of and complete transmission of each scanline, so that the main program could focus purely on application logic. I wrote an inline assembly routine at the start of the interrupt handler to insert a variable number of noop instructions depending on the relative phase of the hardware timer, and the output became rock solid.
That of course reminds me of a project in 2007 where I needed to go the other direction, and decode an analog video signal on an 8 bit PIC microcontroller. The signal was from a camera on an actuator, meant to detect the relative position of the sun for the purpose of aiming a parabolic solar concentrator. I was able to filter out all visible light with some overdeveloped film negative so that the video signal was simply a white dot on a black background, and then wire it up through some voltage dividers to the PIC's two voltage comparators. One comparator detected sync pulses, and the other one detected black to white transitions. The firmware would simply track the timing of sync pulses to know the current scanline and position within the current scanline. Good times!
Very nice! Good work.
Some more info on the digi player on the ST. It used a timer interrupt to service the PCM sample but if you used just the interrupt there was significant noise because there was significant variability of the timing on the interrupt. To get the timing tighter the interrupt timing was changed to hit the routine on every video line just prior to the actual hsync and then hsync was polled to get very precise timing.
The PCM was just a linearization by combining three logarithmic volumes of the three PSG voices.
During the title sequence, a special version of the code was running where several 68000 registers were reserved globally for the digi player. So those did not need to be saved / restored in the interrupt routine!
The SWAR was involved when advancing the four wrapping 8 bit indices into the each of the four voice's 256 entry sample tables. This was part of a monumental effort to get the interrupt routine to be as quick as possible.
Your video decoding reminds me of when I worked at a video card company in the nineties we had a competitive advantage by using a commodity part in an unusual way. This video decoder hardware was commonly used to take composite video and decode it. We supported that but we also did a bunch of advanced features by using a seldom used mode where it could be used in a raw mode where it took the composite signal and stored the analog to digital conversion in memory. We had high speed assembly code that could decode the video better than the hardware and supported some cool additional features. Anyway... memories a bit hazy. Been a while but I remember it being very cool.
Wikipedia says: "It also refers to the use of SIMD with general-purpose registers and instructions that were not meant to do it at the time, by way of various novel software tricks."
https://en.wikipedia.org/wiki/SWAR
Maybe that's the way it's meant?
Compilers might be smart enough to pick up the idiom used in the example and compile them to something done in parallel?
SIMD is "single instruction, multiple data". In this case, the data are single bits. It's basically operations on bit-vectors of width 32 (or whatever the machine word width is). All the bits are treated independently. That's exactly how you'd do SIMD elsewhere.
Why not just
(a >> 1) + (b >> 1) + (a & 1) & (b & 1)That's the "patented solution" referred to in the article.
Which is shocking to me, since this is the solution which jumps immediately to mind (and usually patents should be above that line). I could understand if the next solution was patented, but not this one.
Don't read the article, and you would never know it's patented.
"just"? That's 2 additions, 2 shifts, and 2-3 bitwise operations, while the other method is 1 addition, 1 shift, and 2 bitwise operations.
I would think optimizing compilers can optimize this expression to the one in the article, but I tried it on godbolt and they don't.
Just by reading the headline, before opening the article, I thought of the patented solution in my head. "Just halve before adding, it can be off by one but some boolean logic might do it"
Software patents are absolutely disgusting.
Absolutely the same thing I did. I even had the low bit logic worked out by the time I scrolled the article down and saw the patented line. Clearly we have both had miraculous enlightenment because legally this is “not obvious”.
> Clearly we have both had miraculous enlightenment because legally this is “not obvious”.
To be precise, legally it is "not obvious back in 1996." There is a lot of stuff that is obvious today that wasn't 25 years ago. That said, this one in particular probably would have been invalidated as obvious if it was ever litigated (and it was not). Also, the USPTO has reined in software patents a lot in recent years (but always people advocating for more or less).
Seriously, I could have done this in 1996 and so could anyone. I reckon I could probably have worked this out in 1986. It’s not like binary arithmetic has changed significantly in the last 20 years.
And back in 1996, programmers were much more familiar with bit-twiddling than today.
It's not even computer science, it's literally just math. (a+b)/2 = a/2 + b/2. They were teaching that in pre-algebra in middle schools well before 1996.
Yes. I think though the next solution is not obvious. (a & b)+ (a^b)/2
It's not the solution that's patented, but it's the implementation in a single CPU cycle in hardware.
That’s a pretty important distinction. If someone in the 1800s invented a mechanical calculator that could do this operation in a single crank, I don’t think anyone would upset about that patent.
But then the patent would not be on the logic but the mechanical implementation. The obviousness would need to be judged on that basis. The method can be implemented using straightforward combinational logic so the single crank/cycle is a given after you have come up with the obvious method.
Back before software patents were a thing, the "math" was not patentable. Eliminating software patents will be a return to the previous status quo.
We have eliminated most software patents: see Alice Corp vs CLS bank.
The only thing this patent covers is the physical circuit implementation, not the math.
> The only thing this patent covers is the physical circuit implementation
The claims in that patent are not limited to a physical circuit implementation.
The description includes a circuit diagram as one possible embodiment of the claimed invention, but the patent covers any implementation (and the description indicates it was intended to cover instructions running on general purpose processors).
So the compiler developers had to pay fees? Or Intel and AMD?
If the compiler developers were developing their own circuits, yes.
But likely, Intel/AMD would have had to pay fees if they were implementing a similar solution in their hardware.
As if anyone would ever get prosecuted for that, though.
Given its simplicity this makes me wonder if a compiler has ever transformed legal original IP code into patented code.
That’s not really how software patents are (ab)used. Just having the patent and a vaguely credible claim that someone is using the patented technology is enough to encumber them with enough legal issues that many people will settle instead of fight it.
See also: "Nearly All Binary Searches and Mergesorts are Broken" by Joshua Bloch. The cluefulness or otherwise with which people often react to Bloch's excellent post is not something to ponder very closely if you want to retain any hope in the future of software engineering.
https://ai.googleblog.com/2006/06/extra-extra-read-all-about...
https://news.ycombinator.com/item?id=3530104
https://news.ycombinator.com/item?id=1130463
https://news.ycombinator.com/item?id=14906429
https://news.ycombinator.com/item?id=6799336
https://news.ycombinator.com/item?id=9857392
https://news.ycombinator.com/item?id=12147703
https://news.ycombinator.com/item?id=621557
https://news.ycombinator.com/item?id=7594625
https://news.ycombinator.com/item?id=9113001
https://news.ycombinator.com/item?id=16890739
If doomscrolling all that isn't enough to make you fear for mankind's future I'm pretty sure there's an Ulrich Drepper glibc bug report rejection related to this topic (or several) that you can google...
On topic: Raymond's post has some other great stuff. SWAR!
I want to reply to one of the comments you linked to, which is this:
> I would argue that the bug is not in the algorithm -- the bug is in languages that don't detect integer overflow by default.
Concretely, this is true enough. But abstractly, not so much: the algorithm is actually "buggy" if you abstract the problem a little. Namely, finding a midpoint of two operands does not require that the operands be numbers, or even addable for that matter. The introduction of that requirement is therefore a bug (at least in my eyes). The easiest way to see this is to replace integers with pointers. Then adding two pointers isn't even a well-defined operation in the general case, let alone dividing them by two. Whereas subtracting them and moving half the distance is actually quite well-defined, and we can see it behaves better too.
I would probably go so far as to claim that this is not an isolated example of where thinking about problems more abstractly helps us come up with solutions that have non-obvious benefits.
Subtraction, division and addition is one of the common answers that is still wrong, unless you also want to do a comparison, first, and that is generally high cost.
Read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303 to see many problems around pointer differencing.
A comparison never costs more than an addition or a subtraction.
If you would use a conditional jump, that would have a high cost.
However the maximum or minimum should always be computed without conditional jumps and many CPUs have special instructions for max and min, which are not more expensive than additions or subtractions.
On CPUs without max & min instructions, computing max or min requires 2 instructions (compare + conditional copy).
2 instructions vs. 1 instruction increases the program size but not necessarily the execution time, if the instructions can be overlapped with others.
Due to the complex architecture of modern CPUs, it is impossible to determine the cost of a simple sequence of instructions in the general case.
For each particular CPU, a different but equivalent sequence of instructions can be the best and longer sequences of instructions may happen to be executed in less time, if they can be better overlapped on a certain CPU.
The obvious form of the code with a comparison still produces a conditional branch on latest gcc [1]. It's extremely doubtful that you'll find a version that uses any comparison that consistently performs as quickly as any version that uses a little bit-twiddling, no matter what modern CPU you're talking about.
Many of your statements are misleading in context. Implying that you can't know or deduce things about the cost of a simple sequence of instructions is very odd. All software and people that work on optimization do it all the time.
And remember that the context is a suggestion that pointer types and pointer-subtraction is the answer to a question about integers, so getting into detail about instruction sequences isn't really going to help, as the basic idea is flawed.
[1] https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(file...
The code generated by gcc in this case is just bad.
It is known that gcc fails to do "if conversion" in many cases when it should.
The correct code using the CMOV instruction and only a single computation of the expression could easily be written with inline assembly.
However, if inline assembly is used, there is a much simpler solution using the carry flag, which was presented at the end of the article that started this thread.
In reality, all the high level language solutions that have been listed in the article are very bad in comparison with the right solution using inline assembly.
The solution using inline assembly and the carry flag is actually applicable to any CPU, with the minor inconvenient that the mnemonics for the instructions could vary (which can be handled with defined macros), because all have carry flags and on all CPUs this solution will be the shortest and the fastest.
For the high-level solutions, which is the best of them can vary from CPU to CPU, which was my point.
And I'll reiterate the original point, the pointer-operations (with comparison) solution is not going to be the best in most any case; you admit that it's not the fastest for inline assembly, which you claim is the right solution. It's not going to vary from CPU to CPU - comparisons in most compilers will become branches and cause a stall and the best case (cmov) is still going to be slower than a shift-from-carry or the (a&b)+(a^b)/2 version.
We don't need to admit defeat in optimizing such a simple case. A comparison is unnecessary and will pessimize, compared to a little bit-manipulation.
> I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug.
I haven't seen the mentioned proof, but if said proof is not formal and mechanized and/or does not consider all possibilities, including overflow, then should we really consider it to be a proof of correctness? It might prove some desirable properties, but I don't think we should leave it at that. I certainly don't think we should claim that "It is not sufficient merely to prove a program correct; you have to test it too."
When it comes to software programs, I believe proofs can and should be exhaustive. That does not necessarily mean you need to exhaustively test every input, but it does mean you need to prove correctness for every input, including inputs that might result in overflow or undefined behavior. Otherwise, we should not consider it a proof of correctness.
It was a formal proof, but one based on a false assumption, namely that x+y for two ints x and y is an int and a well-defined operation. It's not.
The trust in the significance of a proof is limited by the trust in the statement to be proven. The statement to be proven was "If XYZ assumptions about the statements in the underlying language hold, then the result of that function is the result of a binary search". The proof is correct. Just that XYZ contained something incorrect.
There is no free lunch. Even if we prove all our software formally correct, we still need to "program" the specifications, with all that comes with it. They can contain bugs, need to be debugged, revisited, etc. The above is an excellent example of this.
The Java solution is simpler because they are finding the average of 2 positive signed integers. So if you add 2 31 bit positive signed integers the result will fit in 32 bits and then you can just do an unsigned right shift to get the average.
Quite “amazing” that googleblog layout breaks on iOS. It’s literally impossible to see half of the text without the reader mode.
Yeah iOS is becoming the new IE. No matter how much people complain about google's chrome domination they at least try to keep up with the standards. iOS browser does not and they even lock the devices to their browser so you can't even choose a browser with a different engine
this isn't an iOS issue. That blog post is broken on Firefox Android as well, and in Chrome dev tools. There is a display inline-block messing it up. The newer blog posts display fine.
I saw the title and thought to just do "(a / 2) + (b / 2)" and a do a little bit of fudging if a or b is odd.
After reading the article, learning that
unsigned average(unsigned a, unsigned b)
{
return (a / 2) + (b / 2) + (a & b & 1);
}
was once patented actually made me a bit sad for our entire system of patents.Why is math patentable? seems crazy to me
What is patentable is "this circuit to compute the average" where the circuit is an adder that drops the bottom bit from the addends, instead ANDing the two bottom bits and using the result as a carry-in.
Though actually it shouldn't be patented because it's an obvious implementation of a math formula (and math is not patentable).
There’s another algorithm that doesn’t depend on knowing which value is larger, the U.S. patent for which expired in 2016:
unsigned average(unsigned a, unsigned b)
{
return (a / 2) + (b / 2) + (a & b & 1);
}
There's no way that should be patentable.It shouldn't be, but the world of software patents is truly bizarre. I have several patents in my name that are completely meaningless to the point of being satirical (stuff like "system to show a list of options and dispatch and action to a web server", "validating information submitted in a web form and returning errors"). Each is 40+ pages of filing, complete with diagrams, and all of them approved. And we need to do it otherwise someone will sue us with an equally bogus patent and we will have no defense.
I have my name on a patent for an imaging pipeline.
I have no idea why it was granted. I sent it to headquarters as a design pattern description.
That’s pretty much the definition of “prior art.”
I guess they were able to reshape it into a form that the patent office wouldn’t laugh out the door. I never really looked at it. I hate reading patents; even my own.
Yup, my name is on two patents that IMO are just mundane semi-obvious stuff. But our investors insisted on building a portfolio of patented IP, so there you go.
The system is super broken.
> no defense.
I am not a lawyer, but I suspect prior art is a defense.
It's more complicated than that. A patent troll will hold a patent for X and then claim you are violating that patent. You, the holder of Y, can then say "well you're violating Y so fuck right off". Or maybe there's another patent that could potentially indicate prior art, but it isn't your patent. Or you're just a really small player with a few patents in your portfolio trying to defend against a troll.
Companies pool their patents together, promising not to sue each other over patents, in order to collectively defend themselves against trolls.
the world would be better if more companies were able or daring enough to fight bullshit patents directly (with prior art, obviousness, etc defenses).
but instead every company chooses to defend themselves by deluging the patent office with more crap, and collective abuse of the patent system to patent stuff even the “inventor” thinks is obvious becomes accepted totally normal behavior.
Story time. I worked for a company who did prior art on something that was frankly fucking to obvious and is now ubiquitous (but as far as I know we did it first). At one point a patent troll sued us. Our clients were on our side and one of them (a very prestigious law school that rhymes with Barvard), offered their resources to help us fight the troll. The C level management decided to not risk it and just paid the troll. We were a 20 person company at the time and it wasn’t worth it I guess.
"The world would be a lot better if <very large group of people/companies> independently did xyz" can be applied to pretty much anything. Except tragedy of the commons is a real thing, and this is why you need governments to step in and regulate.
yes, you are right. meaningful voluntary collective change is unlikely.
however, i believe that individuals should respect the commons, and don’t believe in the tragedy of the commons as an excuse for individual actions that contribute to the problem.
(insert starfish on the beach story)
Well employees could stage a walk out as they do for other causes because the abuse of property rights costs the American economy, worker, and consumer and forces employees to participate in these unethical abuses.
I think Alibaba did this a little while ago - someone tried to sue them for infringing a one-click checkout patent and they did challenge them in court and won.
The solution is pretty simple. Just stop allowing software patents.
Maybe if employees of big tech would whisteblow to Congress on being forced into the unethical abuse of property rights that would help?
Algorithms aren't patentable. What's patented here is a specific circuit that implements this algorithm in order to calculate the average in a single instruction cycle.
I disagree. Not all the claims include the “single instruction cycle,” and IMO several of the claims are general enough that one could make a case that a sequence of instructions directing the operation of a general purpose data path would be covered by the claim. (Claim 3, for example).
this would require a lawsuit to find out, but this wording in the description suggests the authors’ intent with the claims was to patent this method of computing an average on any processor:
> A general purpose computer or processor with suitable circuitry can execute the invention in a single instruction cycle (as is preferred) or multiple instruction cycles.
Algorithms are not meant to be patentable, per alice[0], but that does not mean they are not patentable in practice. The theoretical intent of what is patentable certainly differs from what is enforcible as a patent in practice.
LZW compression[1] was patented, and Unisys actually had success extracting license fees with it[2]. In that sense, clearly an algorithm was "patented enough" that the patent was granted and used, even though the patent is literally just math. The MP3 patent is also just a mathematical algorithm which had even more legal success.
So, while technically algorithms are not patentable, in reality, the USPTO will grant patents for algorithms if you write enough legal gunk around them, and the difference doesn't really matter when a patent troll is sending threatening emails and your lawyers are demanding 20x the licensing fee to take the case.
[0]: https://en.wikipedia.org/wiki/Alice_Corp._v._CLS_Bank_Intern...
Yeah you're right, it seems like the article is misleading. Still, unless I'm reading it wrong it seems like the patent is just describing the hardware translation of this algorithm, it's not really adding anything new. It's still dubious to me.
Yeah, isn't the translation to a circuit even more trivial than the algorithm itself? a / 2 and b / 2 just discard bits. a & b & 1 is just an and of the two low bits. Then we just route those values to an adder (with carry).
Any algorithm directly expressible as an electronic circuit is considered patentable in most countries due to this equivalence relationship. Technically, this includes all computing algorithms in practice, since there is no meaningful distinction between software and hardware.
However, this is also why business method algorithms like the Amazon "One-Click" are not patentable in most countries. There is no trivial theoretical equivalence between one-click shopping and an electronic circuit.
> Algorithms aren't patentable
In the US, unfortunately,they can be (although not all algorithms are patentable). For example algorithms used in many media formats are patented.
I wonder how many US companies use ffmpeg without knowing about their patent situation. Using ffmpeg on Europe or USA is not the same.
It makes very little sense as a circuit, because you'd just put in a 33 or 65 bit adder.
Not even that, you would use the carry out of an N-bit adder as the (N+1)-th bit.
The XOR cursor was patented.
Probably years after it was implemented too
Edit: but back in the 70s the USPTO didn't have the search databases they had in the 90s or 2000s. It's more the XOR patent being wielded in litigation that was extra controversial
That's utterly hilarious.
I've never come across this problem before, I read the headline and that solution came into my head immediately before I'd even clicked. I don't think I'm clever, surely half of HN feels the same way.
Software patents are comical.
The article is in error. It isn't patented.
Yeah, when I read the article title, this is how I thought I would do it. Anything that obvious is not patentable in principle, but in practice, Samsung could still destroy any small business it wanted to by taking them to court over it. The patent system is awful
Another semi-famous software patent: https://patents.google.com/patent/US20040230959A1/en
Exactly. I saw the title, thought "I wonder what other way there is to do this than the obvious one of pre-dividing by 2" and then opened the article and saw that the trivial way to do it was covered by a patent.
Wow! Just wow...
Well, we're the ones allowing the patent scam to continue ...
Just use the following code and you should be good!
unsigned average(unsigned a, unsigned b) { return (a >> 1) + (b >> 1) + (a & b & 1); }It is crazy. Can we get a computer program to spit out thousands of obvious simple programs, get FOSS to use them in various places for prior art to avoid this happening.
Do bitwise shifts instead of dividing by 2. (BRB going to the patent office.)
The compiler will do that for you and dividing by 2 is much clearer.
> There’s another algorithm that doesn’t depend on knowing which value is larger, the U.S. patent for which expired in 2016.
That's completely retarded; it's literally the first solution I think of when I hear this problem.
That's not a solid argument on its own. Today if you want to talk to someone then using a phone might be the first solution you can think of. That doesn't indicate phone was a bad patent in a past.
It was obvious in 1996, too. It is and was the most obvious solution for a programmer fully aware of the problem and wanting to avoid comparisons.
If the average expert in the field immediately comes up with the same or a very similar solution then it obviously isn't non-obvious, which is one of the tests for patentability.
In the case of the phone you already know the patented solution, which obviously makes it impossible for you to judge its obviousness. That presumable wasn't the case with GP and the presented problem.
If a phone is the first solution that comes to mind for a person that never saw or heard of a phone in their life, then that indicates phone was a bad patent.
The fact that patents require time and money makes this even more pathetic and appalling.
> I find it amusing that the PowerPC, patron saint of ridiculous instructions, has an instruction whose name almost literally proclaims its ridiculousness: rldicl. (It stands for “rotate left doubleword by immediate and clear left”.)
I suspect the POWER team has a good sense of humor. There's also the EIEIO instruction https://www.ibm.com/docs/en/aix/7.2?topic=set-eieio-enforce-...
Before reading the article: In x86 assembly, add ax, bx ; rcr ax, 1 works. I guess technically that is with overflow, but using overflow bits as intended.
EDIT: it's included in the collection of methods in the article as expected.
That's lovely. I missed it when reading the article. It's also the winner on AMD Zen architecture based on MCA analysis.
Although `(a & b) + (a ^ b) / 2` is probably the more conservative choice.unsigned midpoint(unsigned a, unsigned b) { asm("add\t%1,%0\n\t" "rcr\t%0" : "+r"(a) : "r"(b)); return a; }
I noticed the following is in the middle of the article with no context that no one else is mentioning:
unsigned average(unsigned a, unsigned b)
{
return (a & b) + (a ^ b) / 2;
}
A quick sanity check of this23 & 21 = 21
23 ^ 21 = 2
21 + 2 / 2 = 22 (order of operations)
I wonder why this is there. It seems the best solution but no one else is mentioning it. It also has no context near it. Nor is it stated correctly. It's just there on it's own.
The average of 23 and 21 is indeed 22.
Oh right, sorry i'll edit this. It works straight up then. Weird it's there with no context.
23 ^ 21 = 2
Sorry, edited the above. This is straight up right then which is weird. It's just there in the middle of the article with no context. In the middle of the SWAR method.
It is the SWAR method. Another comment explains it well, it basically treats each bit position as a 2-bit adder.
I cannot believe that solution was allowed to be patented. How crappy is our patent process? Most engineers writing code would come up with that solution first.
Every patent attorney I've ever worked with has emphasized that engineers are not equipped to determine if an idea is obvious and should let the PTO decide.
They say this because they know the USPTO strategy is to just hand out patents after putting in some bare minimum effort to review, and postpone the real review process to the unlikely day that someone chooses to challenge it in court and can pay private firms to do their job for them.
The winners in this arrangement are the government, the big law firms, and the large corporations that can afford them.
He hinted at this obliquely, but the PowerPC family of bit rotate instructions (ridicl, rlwinm, rlwimi, etc.), although intimidating in the general case, allows shifting, rotation, insertion, masking and more. There are many alternative mnemonics to try to reduce the cognitive complexity but all of these just assemble to them with particular parameters.
This seems fundamental, surprised elementary operations hasn't been made a part of every major language/framework.
>Bonus chatter: C++20 adds a std::midpoint function that calculates the average of two values (rounding toward a).
As an asm geek, I wasn't surprised to read that taking advantage of the carry flag yielded the most efficient code for some processors. I recalled that some ISAs also have special SIMD instructions specifically for unsigned average, so I looked them up:
* x86 SSE/AVX/AVX2 have (V)PAVGB and (V)PAVGW, for 8-bit and 16-bit unsigned integers. These are "rounding" instruction though: adding 1 to the sum before the shift.
* ARM "Neon" has signed and unsigned "Halving Addition". 8,16 or 32 bit integers. Rounding or truncating.
* RISC-V's new Vector Extension has instructions for both signed and unsigned "Averaging Addition". Rounding mode and integer size are modal.
* The on-the-way-out MIPS MSA set has instruction for signed, unsigned, rounded and truncated average, all integer widths.
Some ISAs also have "halving subtraction", but the purpose is not as obvious.
Very cool.
I was surprised that the article didn't mention the need for this in binary search, and the famous problems [1] that occured due to naive attempts.
[1]: https://en.m.wikipedia.org/wiki/Binary_search_algorithm
> gcc doesn’t have a rotation intrinsic, so I couldn’t try it there
Gcc and Clang both recognize the pattern of shifts and OR that reproduce a rotation, and substitute the actual instruction, no intrinsic needed.
I bet MSVC does too.
They recognize how to do a rotation of an unsigned integer value, but they do not recognize how to do the rotation of that value concatenated with the carry bit, which is needed here.
"unsigned long long"?
It's 2022. stdint.h is old enough to drink, and is probably married with a kid on the way. Just include it already?
Reminds me of a Stack Overflow thread, Shortest way to calculate difference between two numbers? [0] Multiple answers ignored the possibility of overflow.
Marshall Clow gave a pretty excellent CppCon talk covering these exact problems, called "std::midpoint? How Hard Could it Be?" https://www.youtube.com/watch?v=sBtAGxBh-XI
yes, this is linked from the article
Eh. I just cast both to a bigger integer type where possible, which in practice, is almost always. So if I'm averaging two uint32_ts, I just cast them to uint64_t beforehand. Or in Rust, with its lovely native support for 128-bit integers, I cast a 64-bit integer to 128-bit.
I find it mind boggling that something as simple as this can actually be patented.
unsigned average(unsigned a, unsigned b)
{
return (a / 2) + (b / 2) + (a & b & 1);
}
That makes the patent system seem broken to me.This was a lot more thorough and in-depth than I expected it to be. But that's Raymond Chen for you.
One of the reasons I love Python is that integers never overflow, so this becomes a trivial problem.
Rounding in Python is interesting though:
https://www.askpython.com/python/built-in-methods/python-rou...
"Also, if the number is of the form x.5, then, the values will be rounded up if the roundup value is an even number. Otherwise, it will be rounded down.
For example, 2.5 will be rounded to 2, since 2 is the nearest even number, and 3.5 will be rounded to 4."
Round half to even is well known outside Python. It's sometimes called bankers' rounding.
Yes it is. But nobody said anything about rounding!
Raymond Chen is a treasure.
Since this discussion is all about patents: my 2 cents on improving the patent system.
Consider a term project of an undergraduate CS course, where the goal is spelled out, but the method is left for discovery.
Methods developed within any such project immediately invalidate patents. They're apparently obvious to folks learning to become "skilled in the art".
Yes, in practice, reaching a legal threshold would be hard (are you sure the students didn't read the patent or any description directly resulting from it?). But I'd definitely run a "patent invalidation course" - if I had confidence that the results would actually affect patents.
Wait, what? How can a patent right be applied on a one line of code that eventually is compiled down to machine code? Sounds ridicolous to me
Isn't it better to do
(a>>1) + (b>>1) + (a&b&1)
No division needed.Your compiler will take care of that. Leave the division for the humans to read.
I'm in a camp that thinks compilers should also take care of the original
At the end of the day it's all just text. There are plenty of steps before any of it does anything at all.unsigned average(unsigned a, unsigned b) { return (a + b) / 2; }For C at least, the spec says that unsigned addition is modulo 2^64 (or 32 or 16 or whatever) so, imagine you had an 8 bit unsigned, 128+128 gives you 0. Divided by 2 is 0. That’s the right answer by the language specification. The trick is to get 128.
What should happen if you store "a+b" in an intermediate value?
If it is used and there's no way around it, then show a compilation warning that there might be overflow. If it can be resolved without being directly used, then it should be optimized away.
Not really. It is harder for most programmers to read (a>>1) than the simpler (a/2) and in most modern programming languages the compiler will notice the division by a power of two and compile to bit shift operations in both cases.
> most programmers
Really depends on where you're coming from. Anyone who has dipped their toes in embedded programming will immediately know they are equivalent, and many will correct /2 to a bitshift, because that's what you want to happen.
I get that bit twiddling is obscure outside of low level programming, but bit shifts really is kindergarten stuff in this domain.
You could also do (a + (b-a)/2) where a is the smaller number.
how is turning (a+b)/2 into a/2 + b/2 + a&b&1 even patentable?
Turning (a+b)/2 into a/2 + b/2 is basic obvious math.
If you do it and to any basic testing you will realize you are getting of by one errors, locking at them can then make it obvious that when they appear and hence how to fix them.
Sure a proof is more complex, but then you can just trivially test it for all smaller-bit numbers over all possible inputs, hence making proofs unnecessary (for that numbers).
This is a solution a not yet graduated bachelor student can find in less then a day.
Having granted a patent for this should lead to disciplinary measurements against the person granting the patent tbh.
I got a pang of nostalgia seeing the Alpha AXP instructions.
Does this all work with BCD encoding?
This guy hacks.
Some unreal solutions here that show how amazing mathematics can be. Especially that Google patented method that only just recently expired.
Props for including the assembler breakdown for every major CPU architecture.
It was a Samsung patent. Only the document was hosted by Google
I had to lol when I saw there was a patent for that.
Divide both operands by 2 was my first idea before loading the page. (I like to try that sometimes before reading the articles.)
I didn't think about the carry bit, but it seems like that would be a logical solution after 5 minutes of extra thinking.
I'm not sure how that's patentable.
That's insane to me.
But maybe there is more too it.
I didn't read the patent itself.
https://patents.google.com/patent/US6007232A/en
The patent is for a circuit design to perform that algorithm in a single cycle. The algorithm was never patented, nor could it be.
In practice it makes no difference, because digital logic designers haven't used schematic capture in a very long time. They most commonly write Verilog (or SystemVerilog, which is a superset), and it looks a lot like C:
logic [31:0] a, b, average; assign average = (a >> 1) + (b >> 1) + (a & b & 32'd1);
Sure but any HW engineer before writing HDL will always draw up a circuit before writing the description. And given that the patent was issued in 1996 it was only like 6-7 years after Verilog became an open standard.
Sorry, I don't understand what you mean. It makes no difference to whom? It definitely makes a difference to someone writing that code, since that code is not patented.
What I mean is that the circuit diagram is a way to sneak it past a patent examiner, but the software/hardware distinction is completely artificial as both are designed in much the same way.
Several of these patent claims are for "an apparatus" because you're not allowed to patent ideas - but any realisation of the algorithm will necessarily be "an apparatus" so the effect is that in fact you can claim algorithms and that's exactly what this is doing.
I'm not a lawyer, so I'm confused. If you patent a hardware implementation of a software algorithm, on the basis that the implementation is novel, how would you stop me from writing that algorithm irrespective of how it executes?
Where in each claim do you see their proposed "hardware implementation" - the novel device you suppose they've invented - that we might distinguish from, for example, every modern microcomputer ? What I see is a description of the algorithm because of course the intent is to patent the algorithm and not some special hardware implementation nobody will manufacture because it's patented.
If you have some novel arrangement of gears and levers to propose, or some recipe of doping material A with chemical B and then doing so-and-such - that's a classical hardware patent, but the whole point of these "an apparatus" patents is that they aren't trying to describe a novel device anybody invented, they're describing the algorithm the patent is for the algorithm and phrases like "an apparatus" are just a wafer thin excuse for this to be waved through by people whose salary is paid for by the endless flood of such software patents.
As a parallel comment is explaining there hasn't for many years been any substantial difference between what seems to be very much "hardware" and what common folk think of as "software".
The patent application even spells it out, explaining that "A general purpose computer [...] can execute the invention in a single instruction cycle [...]". What "invention" do you suppose is being "executed" here? Is the general purpose computer... making a special circuit? Maybe your laptop is now selling DVD players to Best Buy? No. Their "invention" is just the algorithm, they successfully patented the algorithm.
Nobody would "stop you writing that algorithm", but if they wanted to, and if you've got enough money for it to be worth taking your money, while that patent was valid they could sue you for infringing on their invention and unless you've got deep pockets and proof you invented it first, you are screwed.
You don't have to invent a device to file a patent for it. They describe the device, the patent is for the device, the novelty is the construction of the device.
I honestly doubt that they could sue you for infringing on this.