My favorite programming problem to teach: Digit length (2019)

jstrieb.github.io

31 points by equilibrium 2 years ago · 59 comments

Reader

Am I the only one who, in response to: "dont use loops" thought of the following:

   int numDigits(int num){
     if (num < 0){
       return numDigits(-1 * num);
     }
     if (num < 10){
       return 1;
     }
     if (num < 100){
       return 2;
     }
     if (num < 1000){
       return 3;
     }
     if (num < 10000){
       return 4;
     }
     if (num < 100000){
       return 5;
     }
     if (num < 1000000){
       return 6;
     }
     if (num < 10000000){
       return 7;
     }
     if (num < 100000000){
       return 8;
     }
     if (num < 1000000000){
       return 9;
     }
     return 10; // cant be more than 10 since sizeof(int) == 4, otherwise extend to 19
   }

lelanthran 2 years ago
Might not work (I think Python supports bignums, not too sure).
I did this which will work with any length of number[1], and appears to work for all edge cases, including numbers that start with zero and doesn't use loops:
```
     (defun num-digits (n)
       (if (< n 10)
         1
         (+ 1 (num-digits (floor n 10)))))
```
[1] Millions of digits, if your computer has the RAM for it.
kristopolous 2 years ago
The statements will duck type to numbers so you could just be really silly and do something like:
```
   return 1 + (n >= 10) + (n >= 100) + (n >= 1e3) + (n >= 1e4) + (n >= 1e5) + (n >= 1e6) + (n >= 1e7) + (n >= 1e8) + (n >= 1e9) ...
```
- schoen 2 years ago
  Heh, that gives this nice expression
  sum(n>=10**i for i in range(100))
  for (for example) inputs up to a googol. Or with your fix for the base case,
  1 + sum(n>=10**i for i in range(1, 100))
  Another cute way that hides the loop
  from itertools import count, takewhile 1 + len(list(takewhile(lambda x: 10**x <= n, count(1))))
  - kristopolous 2 years ago
    
    a way that follows the rules and also does what you're trying to do:
    def noloops(n): x = lambda n, digits: (n / 1e9, digits + (n >= 10) + (n >= 100) + (n >= 1e3) + (n >= 1e4) + (n >= 1e5) + (n >= 1e6) + (n >= 1e7) + (n >= 1e8) + (n >= 1e9)) y = lambda n, digits: x(*x(*x(*x(*x(*x(n, digits)))))) return y(*y(*y(*y(*y(*y(n, 1))))))[1]
    Now we have it up to 324 digits without any loops.
  - psychoslave 2 years ago
    
    How is that hiding the loop? I would expect takewhile to be a loop although I never used this Python facility.
    
    schoen 2 years ago
    
    All of the itertools functions are implemented using loops, but they heavily abstract over them so that users can think in terms of "streams" (or officially "iterators") without writing loop-oriented code themselves.
MrJohz 2 years ago

Unfortunately in Python, integers have no fixed maximum size and can grow indefinitely, so your algorithm would need to be long enough to cover all the numbers that can conceivably fit in a computer's memory. I guess there's still a finite size there, but I'm not sure how the memory usage scales with digit length, so I don't know what the largest number would be.
- chacham15 2 years ago
  
  Good point, I was thinking in Python 2 (where there are integer limits)
  - Doxin 2 years ago
    
    Not since python 2.5 if I recall correctly. Python has had unlimited size integers for a long time now.

dmurray 2 years ago

I would prefer a solution based on multiplication than division.

If the number is less than 10, return one, otherwise is it less than 100, ...

You should prefer the simplest tool that can do the job. Multiplication is simpler than division. You could digress about floating point errors (Python has infinite-precision integers, but not infinite- precision floats) or performance (division may be in some contexts meaningfully slower) but those are just digressions, the important thing is that it's more complicated.

The edge cases also seem completely clear to me in this approach, you won't accidentally write <= 100, and 0 is handled automatically.

Of course, the division and log and string methods are all fine in practice, but I'm surprised not to see this alternative mentioned.

jonahx 2 years ago

Nice insight. Similarly, dynamic programming solutions that "build up" from the bottom can often be clearer than those "go down" recursively from the top, even though they're equivalent.

Strilanc 2 years ago

It's very strange to me that the teacher would push the students from the correct solution using a loop, towards an incorrect solution using a logarithm. A logarithm could work in a language like C where ints can't get too large, but Python has arbitrary precision integers so any solution using floating point numbers is doomed. For example, the code given in the post returns 16 instead of 15 for 999_999_999_999_999.

smarklefunf 2 years ago

I've come to the retrospective conclusion decades after educational abuse that professors like this are more interested in showboating their mathematics knowledge than drive students to find good pragmatic solutions.
i.e. I know this super complex answer in which is it happens to be the only purely accurate thing. Can you guess it? vs what I would like to see which is: here are the foundational concepts, bring them all together and arrive at the naturally correct conclusion demonstrating your knowledge and understanding
- romwell 2 years ago
  
  Professors?
  This post was written by an undergraduate student.
  A professor would use that as a teaching opportunity to discuss the folly of relying on floating point numbers.
romwell 2 years ago

>It's very strange to me that the teacher would push the students from the correct solution using a loop, towards an incorrect solution using a logarithm
That's because the teacher here is an undergraduate student who has yet to learn from painful experience :)

quirino 2 years ago

Interesting note is that the clever solution fails pretty early: 999999999999999 (15 nines) outputs 16.

I hope the author warned the students about floating point imprecision :P

schoen 2 years ago

I'm belatedly warming up to the idea that the digit length of the number zero is actually zero in various senses. (A footnote in the article points out that we could define it this way and then some students' simpler solutions would actually be considered correct.)

Yes, we obviously normally use a single digit to write zero, but we could logically write it as the empty string "" if we had a way to indicate that a specific number was meant. (Relative to that, writing 0 is just as unnecessary as writing 00 or 000, because we can have a rule that all leading zeroes need not be written.)

As another example of this intuition, suppose that we wrote all integers with a trailing decimal point. In that case the numbers from 1 to 10 would be "1.", "2.", "3.", "4.", "5.", "6.", "7.", "8.", "9.", and "10.", while zero could quite logically just be "." with no digits (as it has only leading zeroes, or we could say "nothing in the ones' place").

Quite a lot of arithmetic algorithms would probably work just fine with these conventions! In fact, they might have fewer exceptions or special cases than they do when they expect an explicit 0.

For instance, using the "zero is written as empty string" convention, Python int() (here simplified to assume input strings of zero or more decimal digits) and str() could look like

  def int(s):
      n = 0
      for c in s:
          n *= 10
          n += "0123456789".find(c)
      return n

  def str(n):
      s = ""
      while n:
          n, d = divmod(n, 10)
          s = "0123456789"[d] + s
      return s

Seems pretty general! Yes, it's annoying or confusing if the output is going to be viewed by a human user in a larger string context without delimiters, as we probably don't want to see things like "I ate tomatoes" instead of "I ate 0 tomatoes".

psychoslave 2 years ago

Aren’t you mixing up 0 the digit and 0 the number here?
Certainly we can use the empty string to denote the number zero, but it doesn’t mean that we have a straight forward convention for how we denote any number which have some null value in some power lower than the most significant one.
Not that it’s impossible to come with some convention that ditches 0 as intermediary digit. For example 302009==3e5+2e3+9 will evaluate to true in many languages out there. In Ruby we even have `302009 === 3e5+2e3+9+''.to_i` that is evaluated to true.
But this notation loses the conveniences afforded by a positional fixed base numeral system provides.
- schoen 2 years ago
  
  I mean that the number 0 specifically can be written by empty string, so that we would count like
  "", "1", "2", "3", "4", ..., "9", "10", "11", "12", ..., "99", "100", "101", ...
  The only change is the empty string being a valid name for a number (the number zero).
  I don't mean to suggest getting rid of place value or the digit 0! This is more like making the place value system more consistent with respect to a corner case.
  And the practical reason that we can't make this change is that we don't have a way to distinguish in writing between the absence of any string and the presence of the empty string.

vsnf 2 years ago

Never would I ever have come up with using logarithms. But I am also a math idiot. My solution used recursion.

js2 2 years ago

(2019). Original submission was discussed at the time:

https://news.ycombinator.com/item?id=21500434

dang 2 years ago

Thanks! Macroexpanded:
My Favorite Programming Problem to Teach: Digit Length - https://news.ycombinator.com/item?id=21500434 - Nov 2019 (106 comments)

t-3 2 years ago

For base 2/16 this is pretty easy if you use assembly, because most ISAs have instructions to count the leading zeros, so for eg. aarch64 to get length of hex integer (assuming you're not trying to do anything weird like signed hex - negative numbers will always be max width this way):

  hlen:
        // int hlen(uint64_t x)
        // takes an integer, returns length of hex string
        mov     x9,     #16             // max width
        clz     x0,     x0              // count binary leading zeros
        cmp     x0,     #64
        lsr     x0,     x0,     #2      // 4 bits per hex digit
        sub     x0,     x9,     x0
        cinc    x0,     x0,     eq      // if num = 0, add 1
        ret

Decimal requires looping (well, only ~20 comparisons needed for 64-bit so maybe that could be unrolled but same thing either way), so it's simpler to just use high level.

_xnmw 2 years ago

This is a mathy solution to a CS problem. I would’ve preferred the route to a more CS-ey solution using bit shifts and recursion. Teaches an understanding of low level representations.

nicoty 2 years ago

I asked a similar question to stack overflow some time ago (to calculate the number of digits in a range) and received a performant solution using only integer arithmetic https://stackoverflow.com/questions/68908632/is-there-a-comp...

benmmurphy 2 years ago

i feel like using the log operator for avoiding a loop is also cheating as well because under the hood there is likely to be a loop. i would have expected the non-looping solution to use either recursion or some abuse of itertools which is really just using a loop as well.

  import itertools
  def digitLength(n):
    if n == 0:
      return 1
    *_, last = itertools.takewhile(lambda acc: acc[0] != 0, itertools.accumulate(itertools.repeat(None), lambda acc, x: (acc[0]//10, acc[1] + 1), initial=(n, 0)))
    return last[1]

the problem is interesting in python because i think the looping solution is not optimal because it performs N divisions for an arbitrarily large integer whereas I think there should be a solution that performs O(log(N)) multiplications for an arbitrarily large integer. in other languages with fixed integers its not really an issue how many operations you do since its effectively constant.

jcynix 2 years ago

Hmm, my solution would be to convert the number to a string and return its length. Easily done in Perl:

  perl -e 'print length 987654'
  6

Similarly easy in Lisp with either of write-to-string, prin1-to-string, and princ-to-string. I expect python to have some such built-in function too.

quirino 2 years ago

It is mentioned at the end of the article. In Python it's as simple as len(str(x)).
- lelanthran 2 years ago
  
  > In Python it's as simple as len(str(x)).
  I consider that to be wrong - leading zeros are not part of the number and should be ignored. Using `len(str(x))` results in `2` for the input `"01"`.
  - Doxin 2 years ago
    
    It's assumed the input is an int. If not you can do len(str(int(x))) to make sure it is which strips leading zeroes off in the process.
    
    lelanthran 2 years ago
    
    That's really my beef with languages like Python - typing is an afterthought. I literally get more safety from C than I do from Python:
    def numDigits(num): return len(str(num)) print (numDigits(1)) # Returns the correct answer print (numDigits("1")) # Returns the correct answer print (numDigits("01")) # Returns the incorrect answer
    The "returns the wrong answer" is the problem. If the input is the wrong type, I expect there to be an error raised, not silently give me wrong answers.
    
    Doxin 2 years ago
    
    Fair enough I suppose. Though I should mention that python is much better at these than a lot of dynamically typed languages. All of these give errors in python but return nonsense in e.g. javascript:
    1+"1" []+{} ({}).foo []/2
    Of course there's the type checkers for python like mypy and pyright but in my experience the type systems these provide are wildly underpowered for statically typing even slightly more complex idiomatic python.
    Re the num digits example: One way to go about this is to add asserts for checking the input types, though generally in python it's considered more idiomatic to check how an object behaves than what type it is.
    In any case in practice I find that by far most type systems, bolted on or built in or otherwise, are more of a hassle than they are worth. One of the few exceptions I've found so far is D. When you rewrite the num digits function in D such that it accepts any type like the python example does, it should be of no surprise that it has the exact same bug as the python version:
    int numDigits(T)(T num){ return num.to!string.length }
    I'm not entirely sure what my point is other than to point out that typing in python isn't an afterthought. it just uses a different type system to what you're used to which allows for bugs you're not used to.
  - Thorrez 2 years ago
    
    The input "01" fails all the other solutions as well. You can't divide or take the log of a string.
    
    lelanthran 2 years ago
    
    > The input "01" fails all the other solutions as well. You can't divide or take the log of a string.
    The other solutions fail differently - they generates an error and stop processing.
    The `len(str())` failure doesn't generate an error, and doesn't stop processing, it simply returns the wrong answer.
    So, the `len` approach is wrong, the other approaches are right.
  - katzenversteher 2 years ago
    
    str(x) converts the argument to a string. Since the argument is an integer, there won't be any leading zeroes. More problematic are negative values though.

nicbou 2 years ago

Wouldn't len(str(num)) be adequate here? This is a quite literal translation of what the code should be doing: measuring the length of the text representation of a number. The mathematical approach seems a little convoluted, although it serves the purpose of teaching a lesson.

qwerty1793 2 years ago

Be careful, doing str(num) requires python to convert the binary representation it has num stored as to a decimal. (C)Python implements a quadratic time change of basis algorithm. This is slow enough that now for very large inputs python will raise "ValueError: Exceeds the limit (4300 digits) for integer string conversion"
- smarklefunf 2 years ago
  
  assuming python
bittumenEntity 2 years ago

At the bottom of the article they mention that this was discouraged because they hadn't covered strings in the course yet
- MrJohz 2 years ago
  
  And more importantly, because it sidesteps the interesting pedagogy around edge cases and testing that the instructor is interested in.
  - vsnf 2 years ago
    
    The correct solution here is to give credit for the problem to acknowledge genuine clever problem solving, and then offer extra credit for doing it the pedagogical way.
    
    MrJohz 2 years ago
    
    There is no correct solution here. A classroom is not a test environment.
    The goal is to learn, and the point of the exercises is to teach a specific concept. If a student finds a different way around the problem, that may show that they're already proficient in other skills, but they haven't necessarily learned the concept being taught in this class yet. A good instructor would probably acknowledge the solution, but add extra boundaries to the task to get the student to explore the problem in a way that lets them encounter the testing difficulties discussed here.
    It's like smuggling a calculator into a class about mental maths strategies: you'll probably do very well in the final test, but you won't have learned anything!
_xnmw 2 years ago

Len() and str() are a loop under the hood.
- emidln 2 years ago
  
  len() is an O(1) lookup of the stored length on the unicode/bytes object in Python.

psychoslave 2 years ago

Looks like LLMs can shine on such a topic: https://chatgpt.com/share/e1facf96-a34f-4d07-b802-f9745a27bb...

purple-leafy 2 years ago

Nice read. Can see that it’s easy to take for granted that this would be a bit challenging for a real beginner to programming or syntax.

Though powers of 10 is a boundary condition so would be one of the first things I test against

Robin_Message 2 years ago

In the final log10 version, can't you just solve both problems at once by adding 1 to the input?

    math.ceil(math.log10(x+1))

david_allison 2 years ago

Was asked this one years back in a university interview. Fantastic problem

chrismorgan 2 years ago

> a clever, unintuitive solution to a difficult problem

If you approach from the mathematics side of things, building on log₁₀ is the completely obvious approach to use. If it seems unintuitive, that’s just because you don’t understand logarithms.

> the autograding test cases did not include a test using a power of 10.

That’s a pretty glaring oversight. Boundary cases are probably the most important things to cover in such tests. I’d be wanting to test things like 0, 1, 9, 10, 11, 99, 100, 101, and so forth. (Also negative numbers, unless the problem explicitly constrained it to {0, 1, 2, …}.)

We often talk about edge cases in testing, but this reveals that the edges can be not just at the extremes, but also at intermediate value changes. Put another way (still fuzzy!), these are the interesting values. You test interesting values.

This would also be a good place to use property testing; instead of hard-coding a bunch of tests (or perhaps as well as), compare against a large number of generated values, comparing with a known-good implementation, probably just len(str(number)).

MrJohz 2 years ago

I'm not sure about the property-based testing thing. In this example, yes, you can easily write a test case that compares against the string-based algorithm. But in general, this is one of the biggest weaknesses of PBT, which is that you need to find a good property to test.
In a lot of cases, the most useful property is "for all inputs, the result is the right answer", but we don't know the right answer until we've written the algorithm, and there's not usually much point writing two algorithms unless, like here, one of those algorithms is trivial correct but inappropriate for some reason.
More generally, I feel like the more trivial a property is to check, the simpler the code to implement it is, and the less useful the test is in general. In the extreme case, a getter/setter pair is very easy to test with PBT, but you rarely need to be testing your getters and setters.
- fanf2 2 years ago
  
  Counting digits is very closely adjacent to the general category of parsing and printing (or serialization/deserialization), which are both tricky to get right and have simple correctness tests. They are perfect candidates for property-based testing.
  - MrJohz 2 years ago
    
    But how do you do PBT here without having a known-correct algorithm available? Because that scenario is seldom the case in practice.
dingaling 2 years ago

Why would mathematics be "completely obvious" when one is dealing with how a computer stores and represents data?
The 'clever' solution fails miserably on value zero and needs hard-coding to handle it. That looks like using the wrong tool.
- chrismorgan 2 years ago
  
  You’re misunderstanding me. I said that if you approach from the mathematics side of things, building on log₁₀ is the completely obvious approach to use. That is, if you’re already a mathematician, of course you’ll see if you can use logarithms, because that will be the way you’ll think—because you’ll view it as a mathematical and algorithmic task.
  As for the handling of zero, well, that’s not about mathematics or about how a computer stores or represents data—that’s about how humans represent data. We choose to special-case zero, allowing it to have a leading zero which we never allow for anything else. All solutions in software will need to special-case zero.
equilibriumOP 2 years ago

> Write a function digitLength(n) that takes a natural number as input (i.e., 0, 1, 2, … – a nonnegative integer)
On another note, you have a very interesting website, I will be in touch at some point when I decide to tackle Rust

abpavel 2 years ago

This is a fantastic example of how seemingly simple programming tasks can teach deep concepts. His methodical approach to uncovering edge cases and alternative solutions demonstrates the importance of critical thinking and comprehensive testing in software development, a valuable lesson for all developers.

dang 2 years ago

Could you please stop posting like this and these?
https://news.ycombinator.com/item?id=40593805
https://news.ycombinator.com/item?id=40593787
https://news.ycombinator.com/item?id=40593761
https://news.ycombinator.com/item?id=40593688
https://news.ycombinator.com/item?id=40593664
We don't want generated comments on HN, or even human-generated summaries. They are too generic to produce interesting conversation, besides which we want people to actually look at, and perhaps even read, articles.
Fortunately your earlier comments look fine so this should be easy to fix!

Settings

My favorite programming problem to teach: Digit length (2019)

Keyboard Shortcuts