Settings

Theme

Numbering Should Start at Zero

cs.utexas.edu

31 points by miqkt 3 years ago · 57 comments

Reader

goto11 3 years ago

Dijkstra have a certain way of writing as if his argument follows by logical necessity. But when you dig into the chain of reasoning, it hinges on the assertion that starting with 0 is "nicer". Which is of course a valid opinion, but starting with 1 also have nice properties, for example that the numbering of elements corresponds to the ordinal numbers. Having the first element be element 1 is pretty nice IMHO.

He ends with the assertion "In corporate religions as in others, the heretic must be cast out not because of the probability that he is wrong but because of the possibility that he is right." This sounds profound and edgy but is just an appeal to contrarians which can be used to justify absolutely anything. You criticize my theory X, that just shows how right I am! And anyway, isn't the "corporate religion" these days to start with 0, since this as what the big mainstream languages does?

  • andromaton 3 years ago

    Yes, I also had found that paper to be surprisingly and disappointedly hand wavey. "Don't meet your heroes" type of thing. Math does not care about pointer arithmetic and i in summation and induction starts at 1. Same in Julia. It's an higher level accommodation to machine code primitives, which are indeed less gate expensive if indexes start at zero.

    • kevinventullo 3 years ago

      In addition to code, I think 0-indexing is also a nice convention for floor numbering. That way the Nth floor is N floors up from the ground level. This is already how basement levels work.

      • Yizahi 3 years ago

        0-based floor numbering is infuriating. Telling someone that you live on the third floor which is actually numbered 2 is honestly dumb. I had to emigrate from the 1-based floor country a year ago and first floor numbered 0 is still throwing me off sometimes. And every single time I want to raise attention to a window somewhere.

      • goto11 3 years ago

        If we admit negative numbers for basements, then were are not using the natural numbers anymore. This is a completely different question since the integers does not have a beginning.

      • andromaton 3 years ago

        I agree. Touche!

        • andromaton 3 years ago

          This is not a 6-hour slow risposte; it's my 6 hour anti procrastinate filter I swear. Anyway, just like all animals of habit, I immediately I agreed with floor zero because I come from a place where the lowest floor is not floor 1 (unlike the US). However, I can see somebody saying floor is the height of the ceiling, so floor 1 = 10 feet; floor 0 (underground) 0 feet. One thing we can all agree is that the calendar, without a zero, it's pretty bad.

    • threatofrain 3 years ago

      I’m not sure about this. I’ve seen more index from 0 for induction and summation in Analysis.

  • Yizahi 3 years ago

    "And anyway, isn't the "corporate religion" these days to start with 0, since this as what the big mainstream languages does? "

    Exactly this. Our firm had been forced to change numbering of real-life items from 1-based to 0-based, because some cargo-cultist somewhere was amazed by the offsets and addresses and wanted to force this numbering scheme everywhere. Now there is a code inside that converts 1-based items to 0-based for legacy stuff, and we are still finding bugs caused by that. Every day to day operations are artificially harder because understanding what is actually unit number 1 or port number 2 is hard now, just like counting total number of anything and remembering to add 1 afterwards. But now we get to have a hype 0-based everything system, which nobody asked for. In some cases this can lead to such awesome combos like physical ports numbered 1-based, then 3rd party switch with 0-based, then Linux interface on top 1-based, then our config on top 0-based. Amazing and convenient, isn't it?:)

jacknews 3 years ago

So the first item in my array of 13 items is at position '0'. The 3rd item is at position '2'.

And if I want to count the items in my array I start at '0' and keep going until I get to '12'. Even though there are 13 items in my array?

I see.

zero-indexing is good for system-level stuff, for dealing with actual memory and so on, but I'm still not convinced it's the right way. It's certainly not the one true way, 1-based indexing is completely valid for some use-cases like maths.

Either way, there will be off-by-one difficulties, either in ordinals or count or whatever, so it's a trade-off.

  • ajuc 3 years ago

    By now we've been doing

        for (int i=0; i<13; i++) {
           whatever(array[i]);
        }
    
        and
    
        def printHello():
           print("hello world"[0:5])
    
    
    for decades. No point vastly messing up consistency for minimal gain. BTW notice how in both cases we count to n elements despite using 0..n-1 indexing. You don't, in fact - need to put 12 anywhere if you mean 13 elements.
    • jacknews 3 years ago

      But

        whatever[1]
      
      gives the second item, and the last item, of 13, is

        whatever[12]
      
      fortran has been 1-based since well before these examples.

      The point is there is no one true way, it's a trade-off.

FullyFunctional 3 years ago

Once you flatten multidimensional arrays the folly of starting at 1 becomes apparent (much like mathematics got so much simpler when we switched to logarithms with the "obvious" log a + log b = log ab which was a breakthrough at the time).

If you have a two dimensional array A (1..N X 1..M), then flattening it becomes A[i][j] = flat[i + (j-1) * N]

Continuing to higher dimensions it only gets uglier.

0-based arrays are far simpler. When flattening A (0..N-1 X 0..M-1) A[i][j] becomes flat[i + N * j].

It's completely analogous to how our base-10 positional system works. Each digit doesn't start at 1, they start at 0, which is why 237 is just 2*10^2 + 3*10^1 + 7*10^0.

ADD: TIL HN supports escaping the astrix.

  • drbaba 3 years ago

    > If you have a two dimensional array A (1..N X 1..M), then flattening it becomes A[i][j] = flat[i + (j-1) * N]

    > When flattening A (0..N-1 X 0..M-1) A[i][j] becomes flat[i + N * j].

    To play the devil’s advocate a bit: Both cases above has the same number of “-1” offsets; the question is whether it is in the indexing or in the bounds. In the 0-indexed case, i goes from 0 to N-1 instead of from 1 to N, and this happens for every array dimension.

  • wruza 3 years ago

    Or we can make programming languages that do not require flattening by hand. Off-by-ones are easy to make in both systems. Also it’s annoying that standard libraries do not include functions like n_ints_inclusive(from, to), range_of_last_n_or_less(arr_size, n), n_elems_before(range, n), index_after_range(range), flatten2(i, j, stride) and so on. As if PL designers wanted us to draw ranges on paper and visually check our reasoning every time we have to deal with indices.

  • magila 3 years ago

    In fact the vast majority of algorithms which involve doing computation on indexes beyond increment/decrement are naturally implemented with zero-based indexing. When using ones based indexing you end up having to convert to zero based, do the computation, then convert back to one based.

    Another common example of this is circular buffers. Computing the bounded index from an arbitrary index is simply i % c with zero based indexing, but becomes (i-1) % c + 1 with one based indexing.

    • drbaba 3 years ago

      > Computing the bounded index from an arbitrary index is simply i % c with zero based indexing, but becomes (i-1) % c + 1 with one based indexing.

      Arguably, this is because the common definition of modular arithmetic is itself zero-based: “modulo N” maps all integers into the numbers [0,N-1]. It is fully possible to define a “%” operator that instead mapped the integers into the range [1,N], which might be more natural in 1-based languages?

      • magila 3 years ago

        Great, now you've messed up all the other math that uses the modulo operator. The mathematical operators behave the way they do for well established reasons that long predate the invention of computers. It's going to be a tough sell to get everyone to adopt a wholesale refactoring of modulo arithmetic (and likely number theory in general) just for the "convenience" of one based indexing.

photochemsyn 3 years ago

I understand this as follows:

If directly accessing byte-addressable memory, the command, 'go to address 0xff and get me the data stored in the ten bytes beginning at 0xff' will return data at 0xff + 0, 0xff + 1, ..., 0xff + 9. Hence counting should start at zero, with the zeroth byte. Describing this sequence as range(0, 10) is convenient since 10 - 0 gives us the count of bytes in the sequence.

If accessing objects in an linear data structure via an interface, at a higher level of abstraction, then a natural choice is to label the objects starting with one, i.e. get the 1st object's data, get the 2nd object's data, ..., get the 10th objects's data. Note, range(1, 11) still works as 11 - 1 gives us the total object count. (Python seems to get this right in both cases, i.e (inclusive, exclusive)).

However, if one is also using low-level byte-addressable memory at the same time, this can get unnecessarily confusing, so it's probably simpler to just always count starting at zero, for consistency's sake - unless you're writing a user interface for people who've been trained to start with one?

wmeredith 3 years ago

If I have some apples on a table and pick single apple up, I don’t have zero apples in my hand. I think that’s why numbering doesn’t start at zero.

  • threatofrain 3 years ago

    Hmm but this example only makes me think about how 0 is useful. Otherwise I wouldn't be able to describe your hand prior to picking up the apple.

  • KnobbleMcKnees 3 years ago

    You're describing counting, not numbering. Like how an array with a count of 1 has one item that is numbered by it's index, 0.

    • CharlieDigital 3 years ago

      As a counterpoint to that, wouldn't it be kinda nice if an array index equaled the array length and index 0 was equivalent to an empty array.

      Think of a basket of apples. If you take the zeroth apple (empty basket), you have no apples. If you take the first apple, you have 1 apple and now the basket is empty again (0).

      One also wouldn't have to do [length - 1] to access the last index; it would naturally be [length].

  • Aaargh20318 3 years ago

    Counting starts at 1, indexing starts at 0.

  • ajuc 3 years ago

    before picking up apple

        table = {apple0, apple1, apple2, apple3}; //length == 4
        hand = {}; //length == 0
    
    after

        table = {apple0, apple1, apple2}; length == 3
        hand = {apple3}; //length == 1
    
    You count apples in hand by looking at how many items there are, not by what indexes they have.
  • jxf 3 years ago

    But how many apples do you have just prior to the scenario starting?

maxloh 3 years ago

I always found array index confusing.

For example, given a array with size N in Python, we can iterate it from 0 to N-1 (onwards) and from -1 to -N (backwards), which is not consistent at all.

Programming language is meant for human eyes. It would be better if array index being 1 to N, and let the compiler substract that 1 for us.

  • bakuninsbart 3 years ago

    Obviously comes from a time when you were either not using a compiler, or you were writing a compiler. Compared to many other foundational oddities in software engineering this is a really minor one, and at this point it is impossible to change. Any new language starting arrays at index 1 would feel unattractive to me.

  • afiori 3 years ago

    0 = -N mod N and N-1 = -1 mod N

    The consistency is that if i < 0 and l[i] exists then l[i] = l[i+N]

  • mixmastamyk 3 years ago

    1-based indexing has confusing aspects as well when manipulating memory locations. 0-based is slightly less confusing over all, and why it is used more often in programming.

  • abdullahkhalids 3 years ago

    One idea is a programming language that allows both, with different syntax. Say A[i] for 0 based and A{i} for 1 based.

    • majewsky 3 years ago

      This sounds like a really good idea... if you want to hide subtle backdoors in innocuous-looking code.

      • abdullahkhalids 3 years ago

        It would be okay for scientific programming languages like Mathematica or Matlab which are usually not used within adversarial security scenarios.

  • nullindividual 3 years ago

    VB.NET starts indexing at 1 if you want a dead language as an example.

  • KnobbleMcKnees 3 years ago

    This is surely an inconsistency with Python, not with array indexing at large.

heisenzombie 3 years ago

He’s not wrong, but ask any normal person “What’s the first thing on this list” and see what happens…

  • thaliaarchi 3 years ago

    This reminds me of when I used zeroeth, oneth, twoth, and threeth, instead of first, second, and third, for referring to indices 0, 1, 2, and 3 as a TA. It's less verbose than “at index zero”, but feels a little clumsy.

    • afiori 3 years ago

      Zeroth is an actual word, I heard it many times in conversation between methematicians.

      • jacknews 3 years ago

        Sure but then what do you call the subsequent items?

        • afiori 3 years ago

          first, second and so on

          • jacknews 3 years ago

            So the 'first' item in the list is actually the second?

            The problem is you're trying to redefine English (and probably most other languages). The first item on a menu is, well the first item, like first place in a marathon, the first day of the month.

            Surely a definition of the zeroth item would be something like an item that does not exist, the item that's left when you take away all the other items, etc.

            Ordinals are 1-indexed. Language is 1-indexed; The #1 player is the best player, and so on.

            • afiori 3 years ago

              zeroth is the ordinal associated with the cardinal zero.

              For example christmas is the zeroth day after christmas.

              A common advice with regards to user inputs is that if you do not do ordering or arithmetic on a piece of data (eg a phone numbers) then it should be a string even if it is numeric.

              Similarly n-indexed conventions should be considered in terms of practical pros and cons.

              Linguistic similarity is not a convincing argument to me.

              • jacknews 3 years ago

                'associated'? Sounds nice, but not very convincing, except perhaps the zeroth item of an ordered set should be just that, 0.

                I agree indexing is a trade-off, sometimes 0 is best, sometimes 1 makes more sense.

                But it's not linguistic 'similarity'. You have to name things. If your names are off-by-one (the element named 'first' is actually the second) you're just sowing confusion.

                • afiori 3 years ago

                  sometime ordinals are more natural sometimes cardinals are more natural.

                  In the case of vectors (especially of C-style arrays) I would say that A[0] is the first element in the array. I agree that things should have names, but names serve us, not us them.

  • afiori 3 years ago

    This is the same argument as saying that object->method(arg) is superior to funtion(self, args) because it maps to the SVO english grammar.

  • threatripper 3 years ago

    For me first and one are synonyms. It's even abbreviated as 1st instead of 0st.

alex_muscar 3 years ago

The part where he asserts indexing should start at zero is this:

> Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N. So let us let our ordinals start at zero: an element's ordinal (subscript) equals the number of elements preceding it in the sequence.

"A nicer range." This is just Dijkstra's preference.

Interestingly, he links ordinals (the subscript) to cardinals (how many numbers there are before the element).

The off by one is just as likely in both schemes with half open intervals.

The nice property of the length of the sequence falling "for free" from the indices is not essential if you track the length separately. While it was important in the age of machines with severely constrained memories, the downsides outweigh the merits.

SAI_Peregrinus 3 years ago

I think about it as a difference between indexes and offsets.

C uses pointer offsets. Base pointer + offset = destination address. C "array" syntax is sugar for this addition. There are no array indices in C, just pointer offsets. Thus, they start at 0.

Some languages have arrays that aren't just a pointer to the first element. In those, indices are a better option. There, starting at 1 usually makes more sense.

The first number used should match the use of that number.

ajuc 3 years ago

I don't know about arrays, but numbering floors starting from 1 is absurd.

In most of Europe ground floor is 0, underground floors are negative, floors above the ground are positive numbers.

Integers were invented for a reason, use them :)

  • Supermancho 3 years ago

    Measuring distance starts at 0 (usually). It's .1 meters, not 1.1 meters from the wall.

    Counting things (measuring how many you have) starts at 1 out of convenience. It makes no sense to talk about all the 0 things (that you don't have ) to start. Communication tends to be as succinct as possible.

    In software, we often are tracking multiple things, so being explicit about how many offset or what an int holds, causes 0 to appear more frequently and "feels better" because the explicit communication is comprehensive.

bionhoward 3 years ago

Wouldn’t 2..12 => 2 <= i <= 12 be the most readable version? Why “< 13”?

klysm 3 years ago

Unfortunately most of the world disagrees

gnufx 3 years ago

Obligatory remark when this is referenced that it was specifically wrong about FORTRAN when written. As of FORTRAN77 you could index arrays as you like -- bounds of any integer -- potentially differently in different parts of the program.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection