Settings

Theme

Why is x & -x equal to the largest power of 2 that divides x?

arunmani.in

176 points by arun-mani-j 2 years ago · 66 comments

Reader

dataflow 2 years ago

Short explanation:

1. The largest power of 2 that divides x is just 2^(number of trailing zeros in x)

2. Crucial observation: -x == ~x + 1

3. ~x flips all the bits of x bits, so none of the bits of ~x match those of x. (i.e. (x & ~x) == 0)

4. When you do +1, all the trailing 1's flip AGAIN, becoming zero like they were in x. The next highest 0 (say it was the n'th) also flips, becoming 1... like it was in x.

5. Crucial observation: The n'th 0 did NOT match the corresponding bit in x prior to the increment, therefore it MUST match after the increment. All higher bits stay as-is.

6. This leaves only the n'th bits matching in x and ~x + 1, isolating the highest power-of-2 divisor of x when you AND them.

  • monktastic1 2 years ago

    Another, more "visual" description:

    Without loss of generality, let the number be xx...xx100...000, where x stands for any bit and there are n trailing zeros.

    1. The largest power of 2 that divides it is 2^n.

    2. The twos-complement representation is xx...xx011...111, where the x's are inverted from before, plus one.

    3. After adding one, we get xx...xx100...000. Notice that the carryover stopped exactly where the zero was (which is where our first 1 from the right originally was).

    4. All the "x" bits are now inverted from their original values, so their & is zero. All the original trailing zero bits are still zero, so their & is zero. The middle 1 has stayed 1, so its & is 1.

    5. Thus, the overall & is 00...00100...000 = 2^n, as required.

    • _proofs 2 years ago

      this is actually a fantastic comment and as someone who thinks more visually, was greatly appreciated. thank you.

    • dataflow 2 years ago

      > Without loss of generality, let the number be xx...xx100...000,

      Note that this is with loss of generality, since there is no guarantee of a 1 bit anywhere in the number ;) but you can handle the zero case specially.

  • mabster 2 years ago

    Thanks! I've used the trick and derived why it works a few times, but always forget.

    I battled to understand the post. Your explanation got me there very quickly.

  • pbsd 2 years ago

    An alternative, more arithmetic, argument:

    - x is 2^b*k for some odd k < 2^(n-b)

    - -x is 2^n - 2^b*k = 2^b*(2^(n-b)-k)

    - k is odd by definition, and (2^(n-b)-k) + k = 0 (mod 2^(n-b)). This means that the LSB must be 1 in both operands, which results in a carry out, and in the following ith bits we have that the sum is 0 mod 2^i if and only if the ith bits of (2^(n-b)-k) and k are distinct. Thus x & -x = 2^b.

    • Droobfest 2 years ago

      This immediately reads like gobbledygook to me, while the parent is easy to follow.

      I’m not trying to dunk on you, but I can’t help to note that the denseness of math is too much for an idiot like me.

      • schoen 2 years ago

        It might be nicer with exponents rendered as superscripts, because it could make it more apparent how the exponents are being used.

    • chx 2 years ago

      > -x is 2^n - 2^b * k

      Erm no. -x is - 2^b * k and it's the "Two's complement" notation which is 2^n - 2 * b * k

      This is because the way we generate the "Two's complement" number. Given A in binary representation, let's call A1 the number you get by flipping every bit in A also called one's complement. Observe how every bit in A+A1 is 1 because if the bit was 0 in A then the bit is 1 in A1 and vica versa. So then that's 2^N-1. Two's complement is created by adding 1 to A1 so then A + A1 is 2^N or A1 = 2^N-A. But you do need to prove this before you can use it.

      • bonzini 2 years ago

        The point of two's complement is that it's the same as modulo-2^n arithmetic. In two's complement -x is defined such that x + -x = 0 (mod 2^n); and because x + ~x = 2^n - 1 (mod 2^n), then -x = ~x + 1.

        Therefore x & -x works only in two's complement, and it's fair to assume it as the parent comment did.

  • jb1991 2 years ago

    I thought there was just a sign bit. If not, how does a system know if a number should be interpreted as positive or negative?

    • cvz 2 years ago

      In the "two's complement" representation, there is a sign bit, but the meaning isn't "invert this number", it's "subtract a large, fixed power of 2 from this number".

      The reason this has become the most common representation for signed numbers in computer hardware is that it makes the difference between signed and unsigned numbers basically irrelevant as far as the hardware is concerned. When the difference does matter (which it does in division, conversions between different word sizes, conversions to/from text, and sometimes multiplication), there are two different instructions for signed and unsigned, and it's up to the programmer or compiler to pick the right one.

      • thaumasiotes 2 years ago

        > n the "two's complement" representation, there is a sign bit, but the meaning isn't "invert this number", it's "subtract a large, fixed power of 2 from this number".

        This is only true if the size of your modulus is fixed. In fact, there is a "sign extension" command allowing you to produce, for example, signed 128-bit values from signed 64-bit values, and this basically requires interpreting two's complement values the same way they represent themselves: a three-bit -1 is just the sum of the positive values +1, +2, and +4. To extend that to six bits, you add the positive values +8, +16, and +32.

        The meaning of the high bit in our six-bit scheme shouldn't be viewed as "when this bit is set, subtract 32 from the value instead of adding it". It is "whether this bit is set or not, all higher bits in the number, the ones that don't exist in the hardware, are equal to it"; under this interpretation, the 8-, 16-, and 32- bits were already set in the three-bit value, and that's why they continue to be set when we do a sign extension.

        Every place value is still positive, but as long as there is no highest set bit, the number itself will be negative, and equal to the value you'd calculate from its representation as a geometric series.

        • kbolino 2 years ago

          This is true mathematically but simple to implement in practice. If the number is signed, shifts to the right and widening casts fill the new bits with a copy of the sign bit. If it's unsigned, shifts to the right and widening casts fill the new bits with zeroes. Shifts to the left and narrowing casts work the same for signed and unsigned.

    • eslaught 2 years ago

      You're thinking about a sign and magnitude representation, which is not how integers are represented in a modern computer.

      The modern version is two's complement. It still has a sign bit, but negating a number involves more than just changing the sign bit since the representation is modular.

      https://en.wikipedia.org/wiki/Two%27s_complement

      https://en.wikipedia.org/wiki/Modular_arithmetic

      • gus_massa 2 years ago

        I agree, but I just want to add that floating point numbers use sign and magnitude representation, so the GP may be confused by that.

    • pests 2 years ago

      2s compliment helps with the problem of otherwise having a +0/-0 and lets you subtract numbers just with adders.

    • dataflow 2 years ago

      Signedness doesn't affect anything here, as long as the representation is in two's complement. The negation of x (i.e. -x) in two's complement is ~x + 1. The interpretation of the bits as signed or unsigned doesn't change any of the following steps.

      • MaxBarraclough 2 years ago

        I think strictly speaking signedness does affect things, assuming we're talking C/C++. If x has type int and were to take the value of INT_MIN, then evaluating -x would result in a signed arithmetic overflow, which is undefined behaviour. (Related: the C standard now insists on use of 2's complement, where it used to permit other schemes.)

        If x had type unsigned int, this wouldn't introduce undefined behaviour.

    • elevatedastalt 2 years ago

      There's still a sign bit. The sign bit tells you whether to interpret the rest of the number as positive or treat as negative and undo the 2s complement operation.

    • umanwizard 2 years ago

      Assuming 4-byte integers,

      0000 is 0

      0001 is 1

      0010 is 2

      0011 is 3

      0100 is 4

      0101 is 5

      0110 is 6

      0111 is 7

      1111 is -1

      1110 is -2

      1101 is -3

      1100 is -4

      1011 is -5

      1010 is -6

      1001 is -7

      1000 is -8

      The highest bit is always 1 when the sign is negative.

  • StevenXC 2 years ago

    Algebraically, where | denotes concatenation, and z is a string of 0s:

    x = y | 1 | z

    -x = ~x + 1

    -x = (~y | 0 | ~z) + 1

    -x = ~y | 1 | z

    x & -x = (y & ~y) | (1 & 1) | (z & z)

    x & -x = 0 | 1 | z

  • arun-mani-jOP 2 years ago

    Thanks for the nice summary. I found myself difficult to explain it so I took the help of visual explanation using the bits structure. :)

  • oldgradstudent 2 years ago

    > 2. Crucial observation: -x == ~x + 1

    To be annoyingly pedantic, this is not simply an observation, this is essentially the definition of 2's complement arithmetic.

    • dataflow 2 years ago

      Some people do see it that way, and I think it's fine if you do, but I think I disagree. To me, the definition of N-bit two's complement is that -x (i.e. 0 - x) is represented by 2^N - x. The fact that that is equal to ~x + 1 is not obvious, or (as I see it) part of the definition.

  • baryphonic 2 years ago

    When I read the title, my immediate thought was "because 2s complement!"

    Your explanation is more thorough.

o11c 2 years ago

Or briefly, copied from my StackOverflow answer[1]:

  v    000...00010100
  ~v   111...11101011 (not used directly, all bits opposite)
  -v   111...11101100 (-v == ~v + 1; this causes all low 1 bits to overflow and carry)
  v&-v 000...00000100 (has a single 1 bit, from the carry)
The linked article is wrong in only mentioning 2 signed integer representations. Old versions of C allowed 3 representations for integers, floats use one of those (sign-magnitude, also used widedly by bignum libraries) and one other (offset binary), and base negative-2 is also possible in theory (not sure if practical for anything), for a total of 5.

[1]: https://stackoverflow.com/a/63552117/1405588

  • arun-mani-jOP 2 years ago

    I tried to explain only one's complement and two's complement because they were only relevant to the explanation.

    TIL we have 5 such representations! :)

    • unconed 2 years ago

      I find this a pretty confusing explanation because the actual mechanism is buried in the middle and you can't understand the first diagram without it.

      The goal is to produce the number `0...010...0` with the same number of trailing zeroes as x.

      If you flip all the bits in x, then `...10...0` turns into `...01...1`. Adding one cascades the tail into `...10...0`.

      You can do this entirely in unsigned math and you don't need to drag two's complement into it. The fact that it corresponds to `x & -x` is a curiosity, and then you have to explain why it works for e.g. negative numbers, where it counts trailing ones instead but still produces a positive.

      There are plenty of other tricks like this, e.g. `x & (x - 1) == 0` tells you if a (non-zero) x is a power of two, but it doesn't work for negatives.

  • lifthrasiir 2 years ago

    And then we have a zig-zag bijection for too many serialization formats.

omnicognate 2 years ago

Grammar nazi correction, but one I think is both interesting and a useful way to remember how these work: ones' complement has the apostrophe at the end (but two's complement is correct).

Ones' complement is the "complement of ones (plural)" in that if you take an n-bit number and the representation of its additive inverse in that scheme and add them as ordinary binary numbers you get a result that is n ones. Eg. using 4 bits, 6 is 0110, -6 is 1001, adding them with ordinary binary addition gives 1111.

Two's complement is the "complement of two (to the n)" in that if you do the same thing you get 2^n. Eg. 6 is 0110, -6 is 1010, adding them with ordinary binary addition gives 10000, or 2^4.

  • layer8 2 years ago

    From https://en.wikipedia.org/wiki/Method_of_complements:

    “Some people, notably Donald Knuth, recommend using the placement of the apostrophe to distinguish between the radix complement and the diminished radix complement. In this usage, the four's complement refers to the radix complement of a number in base four while fours' complement is the diminished radix complement of a number in base 5. However, the distinction is not important when the radix is apparent (nearly always), and the subtle difference in apostrophe placement is not common practice. Most writers use one's and nine's complement, and many style manuals leave out the apostrophe, recommending ones and nines complement.”

    I.e., the distinction isn’t as well established as some claim, and in the context of binary presentations it doesn’t really matter.

tzs 2 years ago

Note that this gives 0 when x == O. For applications where you need the largest power of 2 that divides x because you are going to actually divide x by that you may want to check for 0 first.

amelius 2 years ago

It is a simple trick, but in practice you probably want to know the position of the last nonzero bit instead of the 2^n pattern. And most CPUs have instructions for that, which are more efficient and as a bonus also document better what is happening.

ChuckMcM 2 years ago

Nice article, which is putatively about the question posed, but really is about how fun number theory can be. This trick and the "reverse the order of bits with XOR" kinds of things are gateways into the properties of numbers and their bases. Base 2 happens to be "easy" because you can use simple boolean functions to move it around but once you get the hang of it you can do this with other bases as well.

I really didn't appreciate number theory until I started writing cryptography code and trying to understand some of the 'tricks' used to optimize the algorithms. Fun stuff!

lynx23 2 years ago

This was a real joy to read. Especially kudos for the (I guess already oldschool?) plain text figures. I was reading this with lynx (as I stll frequently use it) and totally enjoyed the flawless accessibility of this article. Whatever the motivation, thanks for putting out content like this, it is totally appreciated!

  • arun-mani-jOP 2 years ago

    Aww thanks! Actually we both should thank Hugo for converting the MarkDown into a very simple HTML structure. I also try to follow semantic web whenever possible (article, header, footer etc. instead of divsoup).

    Very glad to know the site works well with Lynx!

casey2 2 years ago

It's easy to see when a number is divisible by 2 in binary,

just like in decimal: 70 is divisible by 10, 500 is divisible by 100,

so in binary 10 is divisible by 2 10100 is divisible by 4 101010111000 is divisible by 8 etc. (We also know 101010111001 can't be divisible by 8, cause it's only 1 more than a number divisible by 8)

Knowing this we can look at the question backwards take 24 and -24, because 24 is divisible by 8 it must end with a 1 and 3 0s ...0011000

when we take the compliment we get 1100111 which for the same reason must end in a 0 and a run of 1s,

now when we add one to this, all the ones will roll over and we end up with the same tail "1000" as in 24, and since this anything to the left of the tail is a compliment of the corresponding bit in x a bitwise and will just be the tail.

wruza 2 years ago

Because to negate you invert (so & = 0) and add one, which overflows former zeroes until it meets a former one, which flips, so & gives 1 there. Former zeroes are how even a number is.

  01100 (12, 2 bit even)
  10011 (inv 12, & = 0)
  10100 (inv 12 +1 aka -12)
  00100 &, 2 bit even
xjay 2 years ago

See also: Bit Twiddling Hacks

https://graphics.stanford.edu/~seander/bithacks.html

  • FabHK 2 years ago

    Or the delightful Hacker's Delight by Henry S. Warren, Jr.

    https://en.wikipedia.org/wiki/Hacker%27s_Delight

  • mike_hock 2 years ago

    > Compute the sign of an integer

    > On the other hand, if you prefer the result be either -1, 0, or +1, then use:

    > sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));

    > // Or, for more speed but less portability:

    > sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); // -1, 0, or +1

    > // Or, for portability, brevity, and (perhaps) speed:

    > sign = (v > 0) - (v < 0); // -1, 0, or +1

    Or if you prefer portability, speed, and readability at the same time:

        int sign(int i) {
            if(i > 0)
                return 1;
            else if(i < 0)
                return -1;
            else
                return 0;
        }
    
    gets compiled to

            mov     ecx, edi
            sar     ecx, 31
            test    edi, edi
            mov     eax, 1
            cmovle  eax, ecx
            ret
    
    which is the bit shift hack.

    (still useful as a reference for implementing an optimizer, to be read as pseudo code)

    • jimbobthrowawy 2 years ago

      I think a bunch of compilers nowadays are smart enough to see these kinds of hacks and compile them to an equivalent on the target that runs best. Favourite example of this is writing a duff's device to unroll a loop, and then gcc completely ignores it and replaces it with a vectorised loop.

  • anthk 2 years ago

    Similar but for general math and optimizations:

    http://www.hakmem.org/

  • richrichie 2 years ago

    I was looking for this link. You made my morning!

umanwizard 2 years ago

One reason why this is useful is it lets you iterate through the set bits of a number.

For example:

    #include <stdlib.h>
    #include <stdio.h>
    
    void decompose_bits(unsigned x) {
        while (x) {
            unsigned bit = x & -x;
            printf("0x%x (%u)\n", bit, bit);
            x -= bit;
        }
    }
    
    int main(int argc, char *argv[]) {
        unsigned x = atoi(argv[1]);
        decompose_bits(x);
    }
mistercow 2 years ago

This is a fun puzzle to try to figure out before reading the article.

rokicki 2 years ago

To take this to the next level: what does [(a^b) & (-(a^b)) & a] compute? (Assume unsigned arithmetic.)

And then after that: what use can this be put to?

  • rokicki 2 years ago

    This expression is nonzero iff reverse(a) > reverse(b) (where reverse is the bitreversal of an unsigned number).

    It (using the address of the nodes as arguments) can serve as a tiebreaker in a Cartesian tree (such as one implementing a first-fit memory allocator) or even to replace the random priority value in a treap (meaning you need neither storage nor computation for the priority node of the treap).

  • wruza 2 years ago

    Iow, we flip some bits in `a`, then do subj, then mask it back to `a`. It’s unclear what it computes in general, but if `b` disturbs the 2-powerness of `a` then I guess we learn that fact by seeing zero. Not sure where to use it.

    • rokicki 2 years ago

      There's a nice elegant description of what it does, mathematically, and a significant use in Computer Science.

JoeAltmaier 2 years ago

Carry. -x is ~x (not x) PLUS ONE

It's that carry that ripples through, making bits flip until it gets to that 'largest power of 2'

analognoise 2 years ago

Does the site actually mention the book it was in?

ww520 2 years ago

Good explanation. Always good to read up on some bit tweaking once a while.

water-your-self 2 years ago

This meeting could have been a post it note.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection