Settings

Theme

Finding the average of two unsigned integers without overflow

devblogs.microsoft.com

448 points by mlex 4 years ago · 219 comments

Reader

errcorrectcode 4 years ago

Having done computer architecture and bit twiddling x86 in the ye olden days, I immediately, independently converged on the patented solution (code / circuit / Verilog, more or less the same thing). It goes to show how broken the USPTO is because it's obvious to anyone in the field. Patents are supposed to be nonobvious. (35 USC 103)

https://patentdefenses.klarquist.com/obviousness-sec-103/

  • phkahler 4 years ago

    Agreed. I spent about a minute before reading it and came up with the first solution, didn't feel like thinking through the puzzle of how not to care which one is larger, and then settled on the one with the 2016 expiration date. All within 1 to 2 minutes. I briefly considered XOR but didnt feel like remembering more about it - the solution was obvious when I saw it. How any of that was ever patentable is a crime.

  • undecisive 4 years ago

    This is bizarre. I wonder how many of us saw that title, thought "That's a really simple problem, surely?" came up with a solution and then were shocked when their coffee-lacking brain actually came up with the patented solution?

    I mean... ignoring the bitwise arithmentic (which this only obvious to people used to doing binary operations) this is the kind of maths that an 11yo could do.

    That said, the patented solution is a little more complex. But not by much.

    Which makes me curious: what other patents have we violated in our day-to-day without even knowing it?

    • mschuster91 4 years ago

      > Which makes me curious: what other patents have we violated in our day-to-day without even knowing it?

      Patents are like the criminal code - always remember "Three Felonies a Day" [1]. The system is set up so that if you are one of the 99%, the 1% can come in and bust you at will if you become too much of an annoyance/threat. They will find something if they just keep digging deep enough (not to mention that they can have your entire company's activity combed through with a microscope if they find a sympathetic court), and blast you with enough charges and threaten sequential jail time so that you cannot reasonably do anything other than plead guilty and forfeit your right to a fair trial [2].

      And for what it's worth, that "play by the rules as we want or we will destroy you" tactic can even hit multi-billion dollar companies like Epic Games. It's one thing if society decides to regulate business practices by the democratic process of lawmaking... but the fact that Apple can get away banning perfectly legal activities such as adult content, vaping [3] or using a non-Apple payment processor from hundreds of millions of people is just insane, not to mention incredibly damaging to the concept of democracy.

      [1]: https://kottke.org/13/06/you-commit-three-felonies-a-day

      [2]: https://innocenceproject.org/guilty-pleas-on-the-rise-crimin...

      [3]: https://www.macrumors.com/2020/06/01/pax-vape-management-web...

      • Retric 4 years ago

        Freedom means some people will do stuff you don’t like.

        • lukifer 4 years ago

          It also means you are not necessarily entitled to rents for your “discoveries”.

        • mschuster91 4 years ago

          And democracy means, at its core, to restrict the freedom of people committing actions that society has decided to be unlawful.

          • Retric 4 years ago

            Except society hasn’t decided that such action is unlawful even if some people, such as yourself, wish it was. I am sure at least a few vegans would like to outlaw eating meat, but democracy also works the other way ensuring such extremely unpopular laws never get passed.

            • mschuster91 4 years ago

              > I am sure at least a few vegans would like to outlaw eating meat, but democracy also works the other way ensuring such extremely unpopular laws never get passed.

              I would not be so certain about that one.

              Many countries are thinking of outright "meat taxes", health scores (similar to smoking warnings in the desired nudging effect) or extending CO² taxes onto agriculture: meat production causes about 14% of global CO² emissions [1], and outlawing/disincentivizing meat consumption is a very easy, very fast and incredibly effective way of cutting down on CO², methane and dung emissions. Not to mention the indirect emissions from land burning (especially in Brazil) or the societal cost of overconsumption of meat (e.g. obesity and heart issues).

              Personally, I'm in the "omnivore" camp but recognize that the way how we deal with meat products has to be massively reformed. We need to cut waste and curb consumption, the sooner the better.

              [1]: https://www.theguardian.com/environment/2021/sep/13/meat-gre...

              • Retric 4 years ago

                That’s the more popular view, but there is still a group of hard core “meat is murder” vegans. Personally I would say get rid of all farm subsidies, but politics makes strange bedfellows and democracy means accepting compromises up to a point.

        • pchangr 4 years ago

          This will for now on be my preferred quote against “freedom at all cost” arguments.

    • scotty79 4 years ago

      We should end legal protection of ideas as soon as possible.

  • Agentlien 4 years ago

    I just want to second this with my own experience just now:

    I looked at the title while still waking up.

    At first I thought of the low + (high - low) / 2 method. I then figured maybe it was better to simply predivide both numbers before adding and just correcting for the lowest bit (how was that ever patented?!).

    However, I didn't like having to perform two divisions so I thought there was probably something clever one could do with bit operations to avoid it. But, still being tired, I decided I didn't want to actually spend time thinking on the problem and I'd already spent a minute on it.

    • vbezhenar 4 years ago

      x / 2 === x >> 1, it's fast.

      • danbruc 4 years ago

        For unsigned or positive x. Yes, the article is about unsigned integers, but some might see this for the first time and not be aware of this restriction. -3 / 2 == -1 but -3 >> 1 == -2.

        • vbezhenar 4 years ago

          Interesting thing is that `(a >> 1) + (b >> 1) + (a & b & 1)` works correctly for signed integers (if `>>` works like in Java, filling most significant bit with ones for negative numbers). With division you'll need to write different expressions depending on operand signs. E.g. (-3) / 2 + (-5) / 2 + ((-3) & (-5) & 1) = (-1) + (-2) + 1 = -2. But ((-3) >> 1) + ((-5) >> 1) + ((-3) & (-5) & 1) = (-2) + (-3) + 1 = -4.

        • adrian_b 4 years ago

          You can use the appropriate right shift with signed integers as easy as with unsigned integers, you just have to handle in the right way the correction due to the bit shifted out.

          The fact that the right shift for a negative integer gives the floor function of the result just makes the correction easier than if you had used division with truncation towards zero.

          The shifted out bit is always positive, regardless whether the shift had been applied to negative or positive numbers.

          Except for following a tradition generated by a random initial choice, programming would have been in many cases easier if the convention for the division of signed numbers would have been to always generate positive remainders, instead of generating remainders with the same sign as the quotient.

          • danbruc 4 years ago

            With positive remainders you get wired quotient behavior. Why should 10/3 and -10/-3 yield different results? Besides that, the choice is not universal, different languages use different conventions.

            • adrian_b 4 years ago

              Why should 10/3 and -10/-3 yield the same result?

              I do not see where this would be of any use.

              On the other hand, if you want a quotient that has some meaningful relationship with the ratio between the dividend and the divisor, there are other more sensible definitions of the integer division than the one used in modern programming languages.

              You can have either a result that is a floating point number even for integer dividend and divisor, like in Algol, or you can define the division to yield the quotient rounded to even (i.e. with a remainder that does not exceed half of the divisor).

              In both cases 10/3 and -10/-3 would yield the same result and I can imagine cases when that would be useful.

              For the current definition of the integer division, I do not care whether 10/3 and -10/-3 yield the same result. It does not simplify any algorithm that I am aware of, while having a remainder of a known sign simplifies some problems by eliminating some tests for sign.

              • danbruc 4 years ago

                I was not really thinking about application but the mathematics. It seems a reasonable decision to me that |a / b| = |a| / |b| and to not get results of different magnitude depending on sign changes only.

      • Agentlien 4 years ago

        It's fast, but I figured doing that on both sides before adding looked a bit inelegant and maybe it could be avoided by doing "something something bit operations" and then I dropped the thought and clicked the link.

        • simias 4 years ago

          On a modern architecture given that most integers are usually u32 by default but the underlying CPU deals with 64bits natively, I'd just cast to u64 and call it a day.

          Actually I was curious to see if GCC would be smart enough to automatically choose what's the best optimization depending on the underlying architecture, but it doesn't appear to be the case.

          For x86_64 (with -O3 or -Os):

              avg_64bits:
              .LFB0:
                  .cfi_startproc
                  movl    %edi, %edi
                  movl    %esi, %esi
                  leaq    (%rdi,%rsi), %rax
                  shrq    %rax
                  ret
                  .cfi_endproc
          
              avg_patented_do_not_steal:
             .LFB1:
                  .cfi_startproc
                  movl    %edi, %eax
                  movl    %esi, %edx
                  andl    %esi, %edi
                  shrl    %eax
                  shrl    %edx
                  andl    $1, %edi
                  addl    %edx, %eax
                  addl    %edi, %eax
                  ret
          
          Clearly just casting to 64bits seems to denser code

          For ARM32 (-O3 and -Os):

              avg_64bits:
                  push    {fp, lr}
                  movs    r3, #0
                  adds    fp, r1, r0
                  adc     ip, r3, #0
                  mov     r0, fp
                  mov     r1, ip
                  movs    r1, r1, lsr #1
                  mov     r0, r0, rrx
                  pop     {fp, pc}
          
              avg_patented_do_not_steal:
                  and     r3, r1, #1
                  ands    r3, r3, r0
                  add     r0, r3, r0, lsr #1
                  add     r0, r0, r1, lsr #1
                  bx      lr
          
          A lot more register spilling in the 64bit version since it decides to do a true 64bit add using two registers and an adc.

          My code, for reference:

              uint32_t avg_64bits(uint32_t a, uint32_t b) {
                uint64_t la = a;
                uint64_t lb = b;
              
                return (la + lb) / 2;
              }
          
              uint32_t avg_patented_do_not_steal(uint32_t a, uint32_t b) {
                  return (a / 2) + (b / 2) + (a & b & 1);
              }
  • Ygg2 4 years ago

    > Patents are supposed to be nonobvious

    Emphasis on supposed.

    The granted patents include: laser used to exercise cat, and mobile wood based dog game (log used to play fetch).

    https://abovethelaw.com/2017/10/8-of-my-favorite-stupid-pate...

    https://patents.google.com/patent/US5443036A/en

    https://patents.google.com/patent/US6360693

    Apple steals the cake though. By patenting a geometric shape.

  • zanethomas 4 years ago

    100% correct

    the patented solution immediately came to mind

    • grishka 4 years ago

      It almost did for me. I thought that you should be able to divide each number by 2 (or shift one bit) before adding, but that would lose a 1 if both numbers have 1 in their least significant bit. The part with "a & b & 1" fixes that exact issue and is obvious to me in hindsight.

      • wildmanx 4 years ago

        > and is obvious to me in hindsight.

        Everything is. That's kinda hindsight's thing.

        Not so say that a few people in this thread probably saw this solution right away, but the "this was all obvious" crowd in this thread is a little too large for my taste. Be real, guys.

        • ygra 4 years ago

          I guess the part about overflow in the title primes most experienced developers to immediately think about a solution where the added numbers are restricted beforehand to avoid the overflow. From there the obvious answer is to halve them, which leaves the next problem when the numbers are odd.

          If you're not aware that numbers can overflow (and you probably don't tend to think about that for every single + you type, I guess), then the proper solution is less obvious.

  • tapirl 4 years ago

    And what is the intention to make the patent? The second way is actually more useful, not limited to unsigned ints.

    • Tuna-Fish 4 years ago

      But it requires you to know which one is larger. The patented way is faster if you are working with unsigned.

  • ChrisLomont 4 years ago

    The patent is more sophisticated than what the article implies - it's a single clock cycle method, which no compiler I've ever seen will do given the code presented in the article.

    And it's from 1996.

    • bsdetector 4 years ago

      This thread is full of people who challenged themselves to solve it and then failed to come up with the 'obvious' 1-cycle solution. It's clearly non-obvious, as this thread shows.

      The actual patent system failure here is the patent is not useful -- it's not valuable. If you needed this solution, you could sit down and derive it in less than an hour. That's not because it's obvious, but because the scope is so small.

      The only difference between this patent and say a media codec is how long it would take to reinvent it. It might take you 200 years to come up with something as good as h.265, but there's no magic to it. There's a problem, somebody came up with a solution, somebody else could do it again given enough time to work on it. This is true for everything that's ever been patented.

      The point of patents is to compensate for value of the work needed to reinvent, and so the real problem here is that value is less than any sane minimum. The value is less than the patent examiner's time to evaluate it! But court rulings have said it doesn't matter how insignificant a patent is, as long as it does anything at all it's "useful", which leads to these kinds of worthless patents.

      • tremon 4 years ago

        and then failed to come up with the 'obvious' 1-cycle solution

        That's unfair, as the commenters here are providing a software solution. The patent is about a hardware solution which involves two parallel adder circuits. It implements in hardware exactly what the software solution does, but you can't express it in software because there is no operand that expresses "implement this addition twice please". You'd have to express it as:

          avg = [x>>1 + y>>1, x>>1 + y>>1 + 1][x&y&1]
        
        Which isn't 1-cycle either without the specialized adder.
        • tremon 4 years ago

          And I just realized the hardware solution is incredibly sub-optimal. If you were to design this in specialized hardware, you'd use a single (N+1)-bit adder and just discard the least significant bit in the output, not duplicate the entire adder tree in silicon.

        • adrian_b 4 years ago

          There is no need for a specialized adder.

          The patented expression is computable in an obvious way by a single ordinary adder and a single AND gate connected to the carry input of the adder, without any other devices (the shifts and the "& 1" are done by appropriate connections).

          Any ordinary N-bit adder computes the sum of 3 input operands, 2 which are N-bit, and a third which is an 1-bit carry.

      • Dylan16807 4 years ago

        > This thread is full of people who challenged themselves to solve it and then failed to come up with the 'obvious' 1-cycle solution. It's clearly non-obvious, as this thread shows.

        If a significant fraction of people come up with it on the spot, it's obvious. And they did.

        • bsdetector 4 years ago

          I don't even see a single comment mentioning doing this in 1 cycle except from those who read the patent, much less reusing existing functional units to do so, so it's not clear to me any commenter came up with an equivalent to the patented solution or even identified the problem solved by it.

          Keep in mind this solution was to support MPEG-1 video encoding in the olden days when state of the art processors were 100 MHz and 800 nm process. Doing this in 1 cycle while reusing already existing function units seems like a clever solution to me -- not patent-worthy, not difficult, but clever.

          • Dylan16807 4 years ago

            Are you very sure that patent would never get threatened toward a software implementation that doesn't know anything about cycles?

            If so then the technique in the post isn't actually patented.

            If that C code would get threatened, then the 1 cycle thing is a red herring.

            Also "Doing this in 1 cycle while reusing already existing function units"? In hardware you can use a normal adder without any special technique...

    • adrian_b 4 years ago

      Sorry, but this argument about the single-cycle implementation is complete BS.

      Any logic designer, who is not completely incompetent, when seeing the expression

      (a / 2) + (b / 2) + (a & b & 1);

      will notice that this is a 1-cycle operation, because it is just an ordinary single addition.

      In hardware the divisions are made just by connecting the bits of the operands in the right places. Likewise, the "& 1" is done by connecting the LSB's of the operands to a single AND gate and the resulting bit is connected to the carry of the adder, so no extra hardware devices beyond a single adder are needed. This is really absolutely trivial for any logic designer.

      The questions at any hiring interview, even for beginners, would be much more complex than how to implement this expression.

      It is absolutely certain that such a patent should have never been granted, because both the formula and its implementation are obvious for any professional in the field.

  • roberthahn 4 years ago

    What is obvious today might not have been obvious in 1996.

    Our experiences and training has changed dramatically over the past 26 years.

    • scotty79 4 years ago

      I can assure you I would have come up with the patented solution just as fast in 1996 when I was a teenager and dabbled in 6502 assembler on Atari computer. Because I solved it now on the basis of exactly the expeirience and knowledge I acquired back then.

  • b112 4 years ago

    But in as it is non-obvious, as to why it is non-obvious, criteria met.

  • hackthefender 4 years ago

    > It goes to show how broken the USPTO is...

    The patent issued in 1996 and wasn't revisited since then (because never asserted in litigation). The USPTO is a lot different now, a quarter-century later.

    • Dylan16807 4 years ago

      > The USPTO is a lot different now, a quarter-century later.

      Please be more specific or link something that explains how they've improved.

      • cma 4 years ago

        Back then you couldn't early challenge a patent and prevent it from being issued, and once it was issued you couldn't challenge it without violating it and entering trial. Now you can do both.

        • Dylan16807 4 years ago

          Early challenging sounds helpful, but that's also outsourcing the work and it could put more coders at risk of treble damages further down the line from some patent they glanced at and forgot about.

    • nerdponx 4 years ago

      Isn't there also a recourse process by which you can get a patent invalidated? You can't expect USPTO to hire an expert in every single possible field.

      • thfuran 4 years ago

        I think that's usually resolved in court. By which I mean, I don't think there's process beyond choosing to fight any suit brought against you and hoping you win in court.

        • wahern 4 years ago

          > I don't think there's process beyond choosing to fight any suit brought against you and hoping you win in court.

          Not true. See https://en.wikipedia.org/wiki/Reexamination It's even easier today than a decade ago, though the Wikipedia article doesn't explain that aspect very well. (I wouldn't be able to explain it, either. I think it has to do with reduced ability for a patent owner to drag out review, including dragging it into court.) Probably not nearly easy enough, though.

      • kwhitefoot 4 years ago

        Why not?

ridiculous_fish 4 years ago

The "SWAR" approach `(a & b) + (a ^ b) / 2` looks bizarre but can be understood.

Adding two bits produces a sum and a carry:

    0 + 0 = 0, carry 0
    1 + 0 = 1, carry 0
    0 + 1 = 1, carry 0
    1 + 1 = 0, carry 1
So the sum is XOR, and the carry is bitwise AND. We can rewrite x + y as (x ^ y) + (x & y)*2

Distribute the divide, and you get (x ^ y)/2 + (x & y) which is the mystery expression.

(Note this distribution is safe only because (x & y)*2 is even.)

  • justinpombrio 4 years ago

    Alternatively, a direct explanation:

    a & b is the bits they have in common. If they both have a bit x, you should keep it, because the average of x and x is x.

    a ^ b is the bits that only one of them have. You should halve these bits, because the average of x and 0 is x/2.

    • ziml77 4 years ago

      This explanation makes a ton more sense!

      • wildmanx 4 years ago

        But it has lots of assumptions/prerequisites about taking the mean baked-in. It happens to work for this particular case, but in general you don't get far with such hand-wavy reasoning since you just get confused with which assumptions hold and which don't.

        The original explanation was the actual SIMD approach, which is really cool. You can extend it right away to other problems.

        • ggrrhh_ta 4 years ago

          The explanation seems to work even in base 10.

          Let's do the average of 85 and 95. The leftmost digit of both is equal, so we leave it: X5 The "xor"/2 is the sum of the non equal digits divided by 2 (respecting their powers): (8+9)10 = (17)10 = 85 Now, we add them: 5 + 85 = 90 (which is the average of 85 and 95).

          Let's take now 87 and 89: The rightmost digits are equal: 8X We do the 'xor'/2 for the left most: (7+9)/2 = 8 8X + 8 = 88

          Let me know if there are some examples for which it would fail (I only spent a minute to test if the explanation extends to other bases).

          • remram 4 years ago

            The average of 15 and 30 is not 22. The average of 51 and 03 is not 22.

  • paulsmith 4 years ago

    It's a half-adder in software!

  • wildmanx 4 years ago

    Thanks for this explanation.

    What sets this post apart from the rest is that it actually provides useful information for everybody, after all those other posts of type "but that's obvious, I looked at it for 1 minute, saw half the solution, was too lazy for the rest and in hindsight it's all obvious. What a ridiculous patent system."

    Not to say that the patent system is problematic...

  • AnotherGoodName 4 years ago

    Sorry i'm confused and i asked elsewhere.

    How is the above SWAR? It looks like a regular set of instructions.

    • rustybolt 4 years ago

      If you, for example, want to do addition of four 8-bit integers within a 32-bit register, you have to use similar techniques to stop the carry from propagating.

      For example, when x and y are 32-bit integers holding 4 8-bit integers, you can do

          z = (x ^ y) + (x & y) & 0x7f7f7f7f;
      
      Now z holds four 8-bit integers which hold the sum (modulo 256) of the integers of x and y. The bit mask is to stop the carry from propagating.
      • nonsequitur 4 years ago

        This doesn't work because you're not left-shifting (doubling) the carry. But when adding the shifted carry to (x ^ y) we're back to potentially overflowing the highest bits. The solution is to add the highest and the lower bits separately:

          lower = 0x7f7f7f7f;
          highest = ~lower;
          z = ((x & lower) + (y & lower)) ^ ((x ^ y) & highest);
        
        Note this only improves performance for larger container integers.
        • rustybolt 4 years ago

          You're completely right! Typically you don't care about overflow though (and you should use unsigned ints to avoid undefined behavior).

      • sgtnoodle 4 years ago

        That's pretty neat. Is it actually any faster than just doing four 8-bit adds, though?

        Presumably it would take 4 logical instructions to do the vectored math, vs. 4 logical instructions to do the scalar additions.

        I suppose you're looking at a minimum of two registers for the vectored approach, vs. 8 for the scalar approach. Having the result available in separate registers makes them immediately available for use, though.

        There's also the overhead of getting the numbers in and out of memory. Loading and storing one word is obviously going to be way better than loading 4 bytes individually.

        It seems to me like the vectored approach would be better for algorithms that require iterating through a large dataset in memory. The scalar approach would be better for algorithms that have a bunch of dependent calculations. Perhaps that's an obvious conclusion!

        That's pretty neat though. For the large dataset scenario, perhaps you could get a significant speedup on relatively simple architectures such as cortex-m microcontrollers. I suspect that sufficiently modern high end CPUs/compilers wouldn't benefit so much from it, though? All the pipelining, superscaling and caching and whatnot could sufficiently mask the latencies of the memory accesses to the point of being irrelevant. Also, a sufficiently clever compiler could implement the loop with actual SIMD instructions and achieve significantly higher performance than the manual in-register optimization.

        This would be a fun way to compute a basic 8-bit checksum on a binary blob in a microcontroller... Not that it would be practically useful because any non-trivially sized blob would be better served with at least a Fletcher checksum if not a full CRC, both of which seemingly lack the necessary associativity.

        • rustybolt 4 years ago

          > That's pretty neat. Is it actually any faster than just doing four 8-bit adds, though?

          The operation itself is not necessarily faster (it could be faster due to pipelining, I think, and it would probably ne faster when storing 8 8-bit ints in a 64-bit int), but it can save the hassle and runtime of packing and unpacking into four variables.

        • djmips 4 years ago

          This technique was useful on a 68000 in a 4 voice software PCM sampled instrument music player running on interrupts on the Atari ST back in the day.

          • djmips 4 years ago

            Found the game where I worked. Default hardware bleeps and bloops in first version https://www.youtube.com/watch?v=RGOdHT29Jpc

            Four voice digi player using SWAR in second version https://www.youtube.com/watch?v=1GrdvcghDXE

            • sgtnoodle 4 years ago

              Very cool! The difference in audio fidelity is quite impressive.

              Reminds me of a weekend hobby project I did back in 2014 or so. I had an itch to play with analog video signal generation from an atmega328p. Rather than use one of the existing libraries, though, I started from scratch with the goal of achieving the highest possible resolution. I used the SPI peripheral to clock out 8 pixels at a time at 8Mhz without any gaps, giving me something like 12 instructions to prepare the next byte. There wasn't enough RAM for a frame buffer at the resolution, so I instead used character tiles; that ate up the whole budget. I forget what the resolution was, but it was significantly higher than the existing library was capable of. There was a jitter, which I tracked down the the variability in interrupt latency due to the AVR having variable cycle length instructions. I was using a timer interrupt to schedule the start of and complete transmission of each scanline, so that the main program could focus purely on application logic. I wrote an inline assembly routine at the start of the interrupt handler to insert a variable number of noop instructions depending on the relative phase of the hardware timer, and the output became rock solid.

              That of course reminds me of a project in 2007 where I needed to go the other direction, and decode an analog video signal on an 8 bit PIC microcontroller. The signal was from a camera on an actuator, meant to detect the relative position of the sun for the purpose of aiming a parabolic solar concentrator. I was able to filter out all visible light with some overdeveloped film negative so that the video signal was simply a white dot on a black background, and then wire it up through some voltage dividers to the PIC's two voltage comparators. One comparator detected sync pulses, and the other one detected black to white transitions. The firmware would simply track the timing of sync pulses to know the current scanline and position within the current scanline. Good times!

              • djmips 4 years ago

                Very nice! Good work.

                Some more info on the digi player on the ST. It used a timer interrupt to service the PCM sample but if you used just the interrupt there was significant noise because there was significant variability of the timing on the interrupt. To get the timing tighter the interrupt timing was changed to hit the routine on every video line just prior to the actual hsync and then hsync was polled to get very precise timing.

                The PCM was just a linearization by combining three logarithmic volumes of the three PSG voices.

                During the title sequence, a special version of the code was running where several 68000 registers were reserved globally for the digi player. So those did not need to be saved / restored in the interrupt routine!

                The SWAR was involved when advancing the four wrapping 8 bit indices into the each of the four voice's 256 entry sample tables. This was part of a monumental effort to get the interrupt routine to be as quick as possible.

                Your video decoding reminds me of when I worked at a video card company in the nineties we had a competitive advantage by using a commodity part in an unusual way. This video decoder hardware was commonly used to take composite video and decode it. We supported that but we also did a bunch of advanced features by using a seldom used mode where it could be used in a raw mode where it took the composite signal and stored the analog to digital conversion in memory. We had high speed assembly code that could decode the video better than the hardware and supported some cool additional features. Anyway... memories a bit hazy. Been a while but I remember it being very cool.

    • eru 4 years ago

      Wikipedia says: "It also refers to the use of SIMD with general-purpose registers and instructions that were not meant to do it at the time, by way of various novel software tricks."

      https://en.wikipedia.org/wiki/SWAR

      Maybe that's the way it's meant?

      Compilers might be smart enough to pick up the idiom used in the example and compile them to something done in parallel?

    • wildmanx 4 years ago

      SIMD is "single instruction, multiple data". In this case, the data are single bits. It's basically operations on bit-vectors of width 32 (or whatever the machine word width is). All the bits are treated independently. That's exactly how you'd do SIMD elsewhere.

  • dheera 4 years ago

    Why not just

       (a >> 1) + (b >> 1) + (a & 1) & (b & 1)
    • dreamcompiler 4 years ago

      That's the "patented solution" referred to in the article.

      • andi999 4 years ago

        Which is shocking to me, since this is the solution which jumps immediately to mind (and usually patents should be above that line). I could understand if the next solution was patented, but not this one.

    • Dylan16807 4 years ago

      "just"? That's 2 additions, 2 shifts, and 2-3 bitwise operations, while the other method is 1 addition, 1 shift, and 2 bitwise operations.

      • rustybolt 4 years ago

        I would think optimizing compilers can optimize this expression to the one in the article, but I tried it on godbolt and they don't.

worewood 4 years ago

Just by reading the headline, before opening the article, I thought of the patented solution in my head. "Just halve before adding, it can be off by one but some boolean logic might do it"

Software patents are absolutely disgusting.

  • jws 4 years ago

    Absolutely the same thing I did. I even had the low bit logic worked out by the time I scrolled the article down and saw the patented line. Clearly we have both had miraculous enlightenment because legally this is “not obvious”.

    • hackthefender 4 years ago

      > Clearly we have both had miraculous enlightenment because legally this is “not obvious”.

      To be precise, legally it is "not obvious back in 1996." There is a lot of stuff that is obvious today that wasn't 25 years ago. That said, this one in particular probably would have been invalidated as obvious if it was ever litigated (and it was not). Also, the USPTO has reined in software patents a lot in recent years (but always people advocating for more or less).

      • doctor_eval 4 years ago

        Seriously, I could have done this in 1996 and so could anyone. I reckon I could probably have worked this out in 1986. It’s not like binary arithmetic has changed significantly in the last 20 years.

        • unnah 4 years ago

          And back in 1996, programmers were much more familiar with bit-twiddling than today.

      • ______-_-______ 4 years ago

        It's not even computer science, it's literally just math. (a+b)/2 = a/2 + b/2. They were teaching that in pre-algebra in middle schools well before 1996.

    • andi999 4 years ago

      Yes. I think though the next solution is not obvious. (a & b)+ (a^b)/2

  • umeshunni 4 years ago

    It's not the solution that's patented, but it's the implementation in a single CPU cycle in hardware.

    • jagger27 4 years ago

      That’s a pretty important distinction. If someone in the 1800s invented a mechanical calculator that could do this operation in a single crank, I don’t think anyone would upset about that patent.

      • upofadown 4 years ago

        But then the patent would not be on the logic but the mechanical implementation. The obviousness would need to be judged on that basis. The method can be implemented using straightforward combinational logic so the single crank/cycle is a given after you have come up with the obvious method.

        Back before software patents were a thing, the "math" was not patentable. Eliminating software patents will be a return to the previous status quo.

        • pclmulqdq 4 years ago

          We have eliminated most software patents: see Alice Corp vs CLS bank.

          The only thing this patent covers is the physical circuit implementation, not the math.

          • rrss 4 years ago

            > The only thing this patent covers is the physical circuit implementation

            The claims in that patent are not limited to a physical circuit implementation.

            The description includes a circuit diagram as one possible embodiment of the claimed invention, but the patent covers any implementation (and the description indicates it was intended to cover instructions running on general purpose processors).

    • sebazzz 4 years ago

      So the compiler developers had to pay fees? Or Intel and AMD?

      • umeshunni 4 years ago

        If the compiler developers were developing their own circuits, yes.

        But likely, Intel/AMD would have had to pay fees if they were implementing a similar solution in their hardware.

  • teaearlgraycold 4 years ago

    As if anyone would ever get prosecuted for that, though.

    Given its simplicity this makes me wonder if a compiler has ever transformed legal original IP code into patented code.

    • enneff 4 years ago

      That’s not really how software patents are (ab)used. Just having the patent and a vaguely credible claim that someone is using the patented technology is enough to encumber them with enough legal issues that many people will settle instead of fight it.

justin66 4 years ago

See also: "Nearly All Binary Searches and Mergesorts are Broken" by Joshua Bloch. The cluefulness or otherwise with which people often react to Bloch's excellent post is not something to ponder very closely if you want to retain any hope in the future of software engineering.

https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

https://news.ycombinator.com/item?id=3530104

https://news.ycombinator.com/item?id=1130463

https://news.ycombinator.com/item?id=14906429

https://news.ycombinator.com/item?id=6799336

https://news.ycombinator.com/item?id=9857392

https://news.ycombinator.com/item?id=12147703

https://news.ycombinator.com/item?id=621557

https://news.ycombinator.com/item?id=7594625

https://news.ycombinator.com/item?id=9113001

https://news.ycombinator.com/item?id=16890739

If doomscrolling all that isn't enough to make you fear for mankind's future I'm pretty sure there's an Ulrich Drepper glibc bug report rejection related to this topic (or several) that you can google...

On topic: Raymond's post has some other great stuff. SWAR!

  • dataflow 4 years ago

    I want to reply to one of the comments you linked to, which is this:

    > I would argue that the bug is not in the algorithm -- the bug is in languages that don't detect integer overflow by default.

    Concretely, this is true enough. But abstractly, not so much: the algorithm is actually "buggy" if you abstract the problem a little. Namely, finding a midpoint of two operands does not require that the operands be numbers, or even addable for that matter. The introduction of that requirement is therefore a bug (at least in my eyes). The easiest way to see this is to replace integers with pointers. Then adding two pointers isn't even a well-defined operation in the general case, let alone dividing them by two. Whereas subtracting them and moving half the distance is actually quite well-defined, and we can see it behaves better too.

    I would probably go so far as to claim that this is not an isolated example of where thinking about problems more abstractly helps us come up with solutions that have non-obvious benefits.

    • ghusbands 4 years ago

      Subtraction, division and addition is one of the common answers that is still wrong, unless you also want to do a comparison, first, and that is generally high cost.

      Read https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63303 to see many problems around pointer differencing.

      • adrian_b 4 years ago

        A comparison never costs more than an addition or a subtraction.

        If you would use a conditional jump, that would have a high cost.

        However the maximum or minimum should always be computed without conditional jumps and many CPUs have special instructions for max and min, which are not more expensive than additions or subtractions.

        On CPUs without max & min instructions, computing max or min requires 2 instructions (compare + conditional copy).

        2 instructions vs. 1 instruction increases the program size but not necessarily the execution time, if the instructions can be overlapped with others.

        Due to the complex architecture of modern CPUs, it is impossible to determine the cost of a simple sequence of instructions in the general case.

        For each particular CPU, a different but equivalent sequence of instructions can be the best and longer sequences of instructions may happen to be executed in less time, if they can be better overlapped on a certain CPU.

        • ghusbands 4 years ago

          The obvious form of the code with a comparison still produces a conditional branch on latest gcc [1]. It's extremely doubtful that you'll find a version that uses any comparison that consistently performs as quickly as any version that uses a little bit-twiddling, no matter what modern CPU you're talking about.

          Many of your statements are misleading in context. Implying that you can't know or deduce things about the cost of a simple sequence of instructions is very odd. All software and people that work on optimization do it all the time.

          And remember that the context is a suggestion that pointer types and pointer-subtraction is the answer to a question about integers, so getting into detail about instruction sequences isn't really going to help, as the basic idea is flawed.

          [1] https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(file...

          • adrian_b 4 years ago

            The code generated by gcc in this case is just bad.

            It is known that gcc fails to do "if conversion" in many cases when it should.

            The correct code using the CMOV instruction and only a single computation of the expression could easily be written with inline assembly.

            However, if inline assembly is used, there is a much simpler solution using the carry flag, which was presented at the end of the article that started this thread.

            In reality, all the high level language solutions that have been listed in the article are very bad in comparison with the right solution using inline assembly.

            The solution using inline assembly and the carry flag is actually applicable to any CPU, with the minor inconvenient that the mnemonics for the instructions could vary (which can be handled with defined macros), because all have carry flags and on all CPUs this solution will be the shortest and the fastest.

            For the high-level solutions, which is the best of them can vary from CPU to CPU, which was my point.

            • ghusbands 4 years ago

              And I'll reiterate the original point, the pointer-operations (with comparison) solution is not going to be the best in most any case; you admit that it's not the fastest for inline assembly, which you claim is the right solution. It's not going to vary from CPU to CPU - comparisons in most compilers will become branches and cause a stall and the best case (cmov) is still going to be slower than a shift-from-carry or the (a&b)+(a^b)/2 version.

              We don't need to admit defeat in optimizing such a simple case. A comparison is unnecessary and will pessimize, compared to a little bit-manipulation.

  • smaddox 4 years ago

    > I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug.

    I haven't seen the mentioned proof, but if said proof is not formal and mechanized and/or does not consider all possibilities, including overflow, then should we really consider it to be a proof of correctness? It might prove some desirable properties, but I don't think we should leave it at that. I certainly don't think we should claim that "It is not sufficient merely to prove a program correct; you have to test it too."

    When it comes to software programs, I believe proofs can and should be exhaustive. That does not necessarily mean you need to exhaustively test every input, but it does mean you need to prove correctness for every input, including inputs that might result in overflow or undefined behavior. Otherwise, we should not consider it a proof of correctness.

    • wildmanx 4 years ago

      It was a formal proof, but one based on a false assumption, namely that x+y for two ints x and y is an int and a well-defined operation. It's not.

      The trust in the significance of a proof is limited by the trust in the statement to be proven. The statement to be proven was "If XYZ assumptions about the statements in the underlying language hold, then the result of that function is the result of a binary search". The proof is correct. Just that XYZ contained something incorrect.

      There is no free lunch. Even if we prove all our software formally correct, we still need to "program" the specifications, with all that comes with it. They can contain bugs, need to be debugged, revisited, etc. The above is an excellent example of this.

  • benmmurphy 4 years ago

    The Java solution is simpler because they are finding the average of 2 positive signed integers. So if you add 2 31 bit positive signed integers the result will fit in 32 bits and then you can just do an unsigned right shift to get the average.

  • LudwigNagasena 4 years ago

    Quite “amazing” that googleblog layout breaks on iOS. It’s literally impossible to see half of the text without the reader mode.

    • xbmcuser 4 years ago

      Yeah iOS is becoming the new IE. No matter how much people complain about google's chrome domination they at least try to keep up with the standards. iOS browser does not and they even lock the devices to their browser so you can't even choose a browser with a different engine

      • deckard1 4 years ago

        this isn't an iOS issue. That blog post is broken on Firefox Android as well, and in Chrome dev tools. There is a display inline-block messing it up. The newer blog posts display fine.

johnhenry 4 years ago

I saw the title and thought to just do "(a / 2) + (b / 2)" and a do a little bit of fudging if a or b is odd.

After reading the article, learning that

  unsigned average(unsigned a, unsigned b)
  {
    return (a / 2) + (b / 2) + (a & b & 1);
  }

was once patented actually made me a bit sad for our entire system of patents.
  • cphoover 4 years ago

    Why is math patentable? seems crazy to me

    • bonzini 4 years ago

      What is patentable is "this circuit to compute the average" where the circuit is an adder that drops the bottom bit from the addends, instead ANDing the two bottom bits and using the result as a carry-in.

      Though actually it shouldn't be patented because it's an obvious implementation of a math formula (and math is not patentable).

nmilo 4 years ago

    There’s another algorithm that doesn’t depend on knowing which value is larger, the U.S. patent for which expired in 2016:

    unsigned average(unsigned a, unsigned b)
    {
        return (a / 2) + (b / 2) + (a & b & 1);
    }
There's no way that should be patentable.
  • paxys 4 years ago

    It shouldn't be, but the world of software patents is truly bizarre. I have several patents in my name that are completely meaningless to the point of being satirical (stuff like "system to show a list of options and dispatch and action to a web server", "validating information submitted in a web form and returning errors"). Each is 40+ pages of filing, complete with diagrams, and all of them approved. And we need to do it otherwise someone will sue us with an equally bogus patent and we will have no defense.

    • ChrisMarshallNY 4 years ago

      I have my name on a patent for an imaging pipeline.

      I have no idea why it was granted. I sent it to headquarters as a design pattern description.

      That’s pretty much the definition of “prior art.”

      I guess they were able to reshape it into a form that the patent office wouldn’t laugh out the door. I never really looked at it. I hate reading patents; even my own.

      • glonq 4 years ago

        Yup, my name is on two patents that IMO are just mundane semi-obvious stuff. But our investors insisted on building a portfolio of patented IP, so there you go.

        The system is super broken.

    • danuker 4 years ago

      > no defense.

      I am not a lawyer, but I suspect prior art is a defense.

      • staticassertion 4 years ago

        It's more complicated than that. A patent troll will hold a patent for X and then claim you are violating that patent. You, the holder of Y, can then say "well you're violating Y so fuck right off". Or maybe there's another patent that could potentially indicate prior art, but it isn't your patent. Or you're just a really small player with a few patents in your portfolio trying to defend against a troll.

        Companies pool their patents together, promising not to sue each other over patents, in order to collectively defend themselves against trolls.

        • rrss 4 years ago

          the world would be better if more companies were able or daring enough to fight bullshit patents directly (with prior art, obviousness, etc defenses).

          but instead every company chooses to defend themselves by deluging the patent office with more crap, and collective abuse of the patent system to patent stuff even the “inventor” thinks is obvious becomes accepted totally normal behavior.

          • IgorPartola 4 years ago

            Story time. I worked for a company who did prior art on something that was frankly fucking to obvious and is now ubiquitous (but as far as I know we did it first). At one point a patent troll sued us. Our clients were on our side and one of them (a very prestigious law school that rhymes with Barvard), offered their resources to help us fight the troll. The C level management decided to not risk it and just paid the troll. We were a 20 person company at the time and it wasn’t worth it I guess.

          • paxys 4 years ago

            "The world would be a lot better if <very large group of people/companies> independently did xyz" can be applied to pretty much anything. Except tragedy of the commons is a real thing, and this is why you need governments to step in and regulate.

            • rrss 4 years ago

              yes, you are right. meaningful voluntary collective change is unlikely.

              however, i believe that individuals should respect the commons, and don’t believe in the tragedy of the commons as an excuse for individual actions that contribute to the problem.

              (insert starfish on the beach story)

              • erosenbe0 4 years ago

                Well employees could stage a walk out as they do for other causes because the abuse of property rights costs the American economy, worker, and consumer and forces employees to participate in these unethical abuses.

          • jamesfinlayson 4 years ago

            I think Alibaba did this a little while ago - someone tried to sue them for infringing a one-click checkout patent and they did challenge them in court and won.

        • thayne 4 years ago

          The solution is pretty simple. Just stop allowing software patents.

    • erosenbe0 4 years ago

      Maybe if employees of big tech would whisteblow to Congress on being forced into the unethical abuse of property rights that would help?

  • lilyball 4 years ago

    Algorithms aren't patentable. What's patented here is a specific circuit that implements this algorithm in order to calculate the average in a single instruction cycle.

    • rrss 4 years ago

      I disagree. Not all the claims include the “single instruction cycle,” and IMO several of the claims are general enough that one could make a case that a sequence of instructions directing the operation of a general purpose data path would be covered by the claim. (Claim 3, for example).

      this would require a lawsuit to find out, but this wording in the description suggests the authors’ intent with the claims was to patent this method of computing an average on any processor:

      > A general purpose computer or processor with suitable circuitry can execute the invention in a single instruction cycle (as is preferred) or multiple instruction cycles.

    • TheDong 4 years ago

      Algorithms are not meant to be patentable, per alice[0], but that does not mean they are not patentable in practice. The theoretical intent of what is patentable certainly differs from what is enforcible as a patent in practice.

      LZW compression[1] was patented, and Unisys actually had success extracting license fees with it[2]. In that sense, clearly an algorithm was "patented enough" that the patent was granted and used, even though the patent is literally just math. The MP3 patent is also just a mathematical algorithm which had even more legal success.

      So, while technically algorithms are not patentable, in reality, the USPTO will grant patents for algorithms if you write enough legal gunk around them, and the difference doesn't really matter when a patent troll is sending threatening emails and your lawyers are demanding 20x the licensing fee to take the case.

      [0]: https://en.wikipedia.org/wiki/Alice_Corp._v._CLS_Bank_Intern...

      [1]: https://patents.google.com/patent/US4558302A/en

      [2]: https://www.itweb.co.za/content/JBwErvn5oNOq6Db2

    • nmilo 4 years ago

      Yeah you're right, it seems like the article is misleading. Still, unless I'm reading it wrong it seems like the patent is just describing the hardware translation of this algorithm, it's not really adding anything new. It's still dubious to me.

      • dlubarov 4 years ago

        Yeah, isn't the translation to a circuit even more trivial than the algorithm itself? a / 2 and b / 2 just discard bits. a & b & 1 is just an and of the two low bits. Then we just route those values to an adder (with carry).

    • jandrewrogers 4 years ago

      Any algorithm directly expressible as an electronic circuit is considered patentable in most countries due to this equivalence relationship. Technically, this includes all computing algorithms in practice, since there is no meaningful distinction between software and hardware.

      However, this is also why business method algorithms like the Amazon "One-Click" are not patentable in most countries. There is no trivial theoretical equivalence between one-click shopping and an electronic circuit.

    • thayne 4 years ago

      > Algorithms aren't patentable

      In the US, unfortunately,they can be (although not all algorithms are patentable). For example algorithms used in many media formats are patented.

    • Dylan16807 4 years ago

      It makes very little sense as a circuit, because you'd just put in a 33 or 65 bit adder.

      • bonzini 4 years ago

        Not even that, you would use the carry out of an N-bit adder as the (N+1)-th bit.

    • WalterBright 4 years ago

      The XOR cursor was patented.

      • erosenbe0 4 years ago

        Probably years after it was implemented too

        Edit: but back in the 70s the USPTO didn't have the search databases they had in the 90s or 2000s. It's more the XOR patent being wielded in litigation that was extra controversial

  • throwaway22032 4 years ago

    That's utterly hilarious.

    I've never come across this problem before, I read the headline and that solution came into my head immediately before I'd even clicked. I don't think I'm clever, surely half of HN feels the same way.

    Software patents are comical.

  • version_five 4 years ago

    Yeah, when I read the article title, this is how I thought I would do it. Anything that obvious is not patentable in principle, but in practice, Samsung could still destroy any small business it wanted to by taking them to court over it. The patent system is awful

  • TYPE_FASTER 4 years ago

    Another semi-famous software patent: https://patents.google.com/patent/US20040230959A1/en

  • coutego 4 years ago

    Exactly. I saw the title, thought "I wonder what other way there is to do this than the obvious one of pre-dividing by 2" and then opened the article and saw that the trivial way to do it was covered by a patent.

    Wow! Just wow...

  • c-linkage 4 years ago

    Just use the following code and you should be good!

      unsigned average(unsigned a, unsigned b)
      {
          return (a >> 1) + (b >> 1) + (a & b & 1);
      }
  • quickthrower2 4 years ago

    It is crazy. Can we get a computer program to spit out thousands of obvious simple programs, get FOSS to use them in various places for prior art to avoid this happening.

  • StayTrue 4 years ago

    Do bitwise shifts instead of dividing by 2. (BRB going to the patent office.)

rustybolt 4 years ago

> There’s another algorithm that doesn’t depend on knowing which value is larger, the U.S. patent for which expired in 2016.

That's completely retarded; it's literally the first solution I think of when I hear this problem.

  • kuboble 4 years ago

    That's not a solid argument on its own. Today if you want to talk to someone then using a phone might be the first solution you can think of. That doesn't indicate phone was a bad patent in a past.

    • ghusbands 4 years ago

      It was obvious in 1996, too. It is and was the most obvious solution for a programmer fully aware of the problem and wanting to avoid comparisons.

    • wongarsu 4 years ago

      If the average expert in the field immediately comes up with the same or a very similar solution then it obviously isn't non-obvious, which is one of the tests for patentability.

      In the case of the phone you already know the patented solution, which obviously makes it impossible for you to judge its obviousness. That presumable wasn't the case with GP and the presented problem.

    • SkeuomorphicBee 4 years ago

      If a phone is the first solution that comes to mind for a person that never saw or heard of a phone in their life, then that indicates phone was a bad patent.

  • d_tr 4 years ago

    The fact that patents require time and money makes this even more pathetic and appalling.

dralley 4 years ago

> I find it amusing that the PowerPC, patron saint of ridiculous instructions, has an instruction whose name almost literally proclaims its ridiculousness: rldicl. (It stands for “rotate left doubleword by immediate and clear left”.)

I suspect the POWER team has a good sense of humor. There's also the EIEIO instruction https://www.ibm.com/docs/en/aix/7.2?topic=set-eieio-enforce-...

benlivengood 4 years ago

Before reading the article: In x86 assembly, add ax, bx ; rcr ax, 1 works. I guess technically that is with overflow, but using overflow bits as intended.

EDIT: it's included in the collection of methods in the article as expected.

  • jart 4 years ago

    That's lovely. I missed it when reading the article. It's also the winner on AMD Zen architecture based on MCA analysis.

        unsigned midpoint(unsigned a, unsigned b) {
          asm("add\t%1,%0\n\t"
              "rcr\t%0"
              : "+r"(a)
              : "r"(b));
          return a;
        }
    
    Although `(a & b) + (a ^ b) / 2` is probably the more conservative choice.
AnotherGoodName 4 years ago

I noticed the following is in the middle of the article with no context that no one else is mentioning:

    unsigned average(unsigned a, unsigned b)
    {
        return (a & b) + (a ^ b) / 2;
    }
A quick sanity check of this

23 & 21 = 21

23 ^ 21 = 2

21 + 2 / 2 = 22 (order of operations)

I wonder why this is there. It seems the best solution but no one else is mentioning it. It also has no context near it. Nor is it stated correctly. It's just there on it's own.

yalogin 4 years ago

I cannot believe that solution was allowed to be patented. How crappy is our patent process? Most engineers writing code would come up with that solution first.

  • stathibus 4 years ago

    Every patent attorney I've ever worked with has emphasized that engineers are not equipped to determine if an idea is obvious and should let the PTO decide.

    They say this because they know the USPTO strategy is to just hand out patents after putting in some bare minimum effort to review, and postpone the real review process to the unlikely day that someone chooses to challenge it in court and can pay private firms to do their job for them.

    The winners in this arrangement are the government, the big law firms, and the large corporations that can afford them.

classichasclass 4 years ago

He hinted at this obliquely, but the PowerPC family of bit rotate instructions (ridicl, rlwinm, rlwimi, etc.), although intimidating in the general case, allows shifting, rotation, insertion, masking and more. There are many alternative mnemonics to try to reduce the cognitive complexity but all of these just assemble to them with particular parameters.

leptoniscool 4 years ago

This seems fundamental, surprised elementary operations hasn't been made a part of every major language/framework.

  • mzs 4 years ago

    >Bonus chatter: C++20 adds a std::midpoint function that calculates the average of two values (rounding toward a).

Findecanor 4 years ago

As an asm geek, I wasn't surprised to read that taking advantage of the carry flag yielded the most efficient code for some processors. I recalled that some ISAs also have special SIMD instructions specifically for unsigned average, so I looked them up:

* x86 SSE/AVX/AVX2 have (V)PAVGB and (V)PAVGW, for 8-bit and 16-bit unsigned integers. These are "rounding" instruction though: adding 1 to the sum before the shift.

* ARM "Neon" has signed and unsigned "Halving Addition". 8,16 or 32 bit integers. Rounding or truncating.

* RISC-V's new Vector Extension has instructions for both signed and unsigned "Averaging Addition". Rounding mode and integer size are modal.

* The on-the-way-out MIPS MSA set has instruction for signed, unsigned, rounded and truncated average, all integer widths.

Some ISAs also have "halving subtraction", but the purpose is not as obvious.

unwind 4 years ago

Very cool.

I was surprised that the article didn't mention the need for this in binary search, and the famous problems [1] that occured due to naive attempts.

[1]: https://en.m.wikipedia.org/wiki/Binary_search_algorithm

ncmncm 4 years ago

> gcc doesn’t have a rotation intrinsic, so I couldn’t try it there

Gcc and Clang both recognize the pattern of shifts and OR that reproduce a rotation, and substitute the actual instruction, no intrinsic needed.

I bet MSVC does too.

  • adrian_b 4 years ago

    They recognize how to do a rotation of an unsigned integer value, but they do not recognize how to do the rotation of that value concatenated with the carry bit, which is needed here.

user-the-name 4 years ago

"unsigned long long"?

It's 2022. stdint.h is old enough to drink, and is probably married with a kid on the way. Just include it already?

MaxBarraclough 4 years ago

Reminds me of a Stack Overflow thread, Shortest way to calculate difference between two numbers? [0] Multiple answers ignored the possibility of overflow.

[0] https://stackoverflow.com/q/10589559/

favorited 4 years ago

Marshall Clow gave a pretty excellent CppCon talk covering these exact problems, called "std::midpoint? How Hard Could it Be?" https://www.youtube.com/watch?v=sBtAGxBh-XI

Subsentient 4 years ago

Eh. I just cast both to a bigger integer type where possible, which in practice, is almost always. So if I'm averaging two uint32_ts, I just cast them to uint64_t beforehand. Or in Rust, with its lovely native support for 128-bit integers, I cast a 64-bit integer to 128-bit.

everyone 4 years ago

I find it mind boggling that something as simple as this can actually be patented.

  unsigned average(unsigned a, unsigned b)
  {
      return (a / 2) + (b / 2) + (a & b & 1);
  }
That makes the patent system seem broken to me.
mark-r 4 years ago

This was a lot more thorough and in-depth than I expected it to be. But that's Raymond Chen for you.

One of the reasons I love Python is that integers never overflow, so this becomes a trivial problem.

Beldin 4 years ago

Since this discussion is all about patents: my 2 cents on improving the patent system.

Consider a term project of an undergraduate CS course, where the goal is spelled out, but the method is left for discovery.

Methods developed within any such project immediately invalidate patents. They're apparently obvious to folks learning to become "skilled in the art".

Yes, in practice, reaching a legal threshold would be hard (are you sure the students didn't read the patent or any description directly resulting from it?). But I'd definitely run a "patent invalidation course" - if I had confidence that the results would actually affect patents.

hollowturtle 4 years ago

Wait, what? How can a patent right be applied on a one line of code that eventually is compiled down to machine code? Sounds ridicolous to me

bufferoverflow 4 years ago

Isn't it better to do

    (a>>1) + (b>>1) + (a&b&1)
No division needed.
  • jws 4 years ago

    Your compiler will take care of that. Leave the division for the humans to read.

    • xaduha 4 years ago

      I'm in a camp that thinks compilers should also take care of the original

          unsigned average(unsigned a, unsigned b) {
              return (a + b) / 2;
          }
      
      At the end of the day it's all just text. There are plenty of steps before any of it does anything at all.
      • jws 4 years ago

        For C at least, the spec says that unsigned addition is modulo 2^64 (or 32 or 16 or whatever) so, imagine you had an 8 bit unsigned, 128+128 gives you 0. Divided by 2 is 0. That’s the right answer by the language specification. The trick is to get 128.

      • Dylan16807 4 years ago

        What should happen if you store "a+b" in an intermediate value?

        • xaduha 4 years ago

          If it is used and there's no way around it, then show a compilation warning that there might be overflow. If it can be resolved without being directly used, then it should be optimized away.

  • 8jy89hui 4 years ago

    Not really. It is harder for most programmers to read (a>>1) than the simpler (a/2) and in most modern programming languages the compiler will notice the division by a power of two and compile to bit shift operations in both cases.

    • marginalia_nu 4 years ago

      > most programmers

      Really depends on where you're coming from. Anyone who has dipped their toes in embedded programming will immediately know they are equivalent, and many will correct /2 to a bitshift, because that's what you want to happen.

      I get that bit twiddling is obscure outside of low level programming, but bit shifts really is kindergarten stuff in this domain.

dpacmittal 4 years ago

You could also do (a + (b-a)/2) where a is the smaller number.

dathinab 4 years ago

how is turning (a+b)/2 into a/2 + b/2 + a&b&1 even patentable?

Turning (a+b)/2 into a/2 + b/2 is basic obvious math.

If you do it and to any basic testing you will realize you are getting of by one errors, locking at them can then make it obvious that when they appear and hence how to fix them.

Sure a proof is more complex, but then you can just trivially test it for all smaller-bit numbers over all possible inputs, hence making proofs unnecessary (for that numbers).

This is a solution a not yet graduated bachelor student can find in less then a day.

Having granted a patent for this should lead to disciplinary measurements against the person granting the patent tbh.

phs318u 4 years ago

I got a pang of nostalgia seeing the Alpha AXP instructions.

avmich 4 years ago

Does this all work with BCD encoding?

blobbers 4 years ago

This guy hacks.

kingcharles 4 years ago

Some unreal solutions here that show how amazing mathematics can be. Especially that Google patented method that only just recently expired.

Props for including the assembler breakdown for every major CPU architecture.

  • readthenotes1 4 years ago

    It was a Samsung patent. Only the document was hosted by Google

  • ijidak 4 years ago

    I had to lol when I saw there was a patent for that.

    Divide both operands by 2 was my first idea before loading the page. (I like to try that sometimes before reading the articles.)

    I didn't think about the carry bit, but it seems like that would be a logical solution after 5 minutes of extra thinking.

    I'm not sure how that's patentable.

    That's insane to me.

    But maybe there is more too it.

    I didn't read the patent itself.

    • staticassertion 4 years ago

      https://patents.google.com/patent/US6007232A/en

      The patent is for a circuit design to perform that algorithm in a single cycle. The algorithm was never patented, nor could it be.

      • not2b 4 years ago

        In practice it makes no difference, because digital logic designers haven't used schematic capture in a very long time. They most commonly write Verilog (or SystemVerilog, which is a superset), and it looks a lot like C:

        logic [31:0] a, b, average; assign average = (a >> 1) + (b >> 1) + (a & b & 32'd1);

        • WillSlim95 4 years ago

          Sure but any HW engineer before writing HDL will always draw up a circuit before writing the description. And given that the patent was issued in 1996 it was only like 6-7 years after Verilog became an open standard.

        • staticassertion 4 years ago

          Sorry, I don't understand what you mean. It makes no difference to whom? It definitely makes a difference to someone writing that code, since that code is not patented.

          • not2b 4 years ago

            What I mean is that the circuit diagram is a way to sneak it past a patent examiner, but the software/hardware distinction is completely artificial as both are designed in much the same way.

      • tialaramex 4 years ago

        Several of these patent claims are for "an apparatus" because you're not allowed to patent ideas - but any realisation of the algorithm will necessarily be "an apparatus" so the effect is that in fact you can claim algorithms and that's exactly what this is doing.

        • staticassertion 4 years ago

          I'm not a lawyer, so I'm confused. If you patent a hardware implementation of a software algorithm, on the basis that the implementation is novel, how would you stop me from writing that algorithm irrespective of how it executes?

          • tialaramex 4 years ago

            Where in each claim do you see their proposed "hardware implementation" - the novel device you suppose they've invented - that we might distinguish from, for example, every modern microcomputer ? What I see is a description of the algorithm because of course the intent is to patent the algorithm and not some special hardware implementation nobody will manufacture because it's patented.

            If you have some novel arrangement of gears and levers to propose, or some recipe of doping material A with chemical B and then doing so-and-such - that's a classical hardware patent, but the whole point of these "an apparatus" patents is that they aren't trying to describe a novel device anybody invented, they're describing the algorithm the patent is for the algorithm and phrases like "an apparatus" are just a wafer thin excuse for this to be waved through by people whose salary is paid for by the endless flood of such software patents.

            As a parallel comment is explaining there hasn't for many years been any substantial difference between what seems to be very much "hardware" and what common folk think of as "software".

            The patent application even spells it out, explaining that "A general purpose computer [...] can execute the invention in a single instruction cycle [...]". What "invention" do you suppose is being "executed" here? Is the general purpose computer... making a special circuit? Maybe your laptop is now selling DVD players to Best Buy? No. Their "invention" is just the algorithm, they successfully patented the algorithm.

            Nobody would "stop you writing that algorithm", but if they wanted to, and if you've got enough money for it to be worth taking your money, while that patent was valid they could sue you for infringing on their invention and unless you've got deep pockets and proof you invented it first, you are screwed.

            • staticassertion 4 years ago

              You don't have to invent a device to file a patent for it. They describe the device, the patent is for the device, the novelty is the construction of the device.

              I honestly doubt that they could sue you for infringing on this.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection