CalcGPT

164 points by CrLf a year ago · 53 comments

Reader

cjrd a year ago

It's great to see a _real_ AI application among all this media noise ;-).

Seriously though, this is wonderful satire. I asked 88x10 and it returned an HTML meta tag.

qbane a year ago

The two sliders at the top are the best. The most customizable calculator to my knowledge.

Cue the comments about criticism of this calculator being unfair as thinking, for example, that 88*10 = 888 is a ‘very human’ mistake to make.

aflag a year ago

I got 883, which is also very human. They just forgot to write one of the halves of 8
lo0dot0 a year ago

You can only get an 8 in the rightmost digit of the result by multiples of the rightmost digits, but 08 obviously gets you a 0, so fairly easy to see this is wrong.
(10a+b)(10c+d) = 100ac+10(ad+bc)+bd
- xanderlewis a year ago
  
  Well… I was joking. Even more generally, multiplication by b in base b gives a zero at the end.
  - xanderlewis a year ago
    
    … and, in fact, ‘b in base b’ always looks like 10 anyway!

rkwz a year ago

> GPT-3 (babbage-002)

I'm surprised babbage is still available via APIs - https://platform.openai.com/docs/models/gpt-base

Anyone else using this?

simonw a year ago

This neat demo is a year old now, it was first released in July 2023.

Source code and prompt here: https://github.com/Calvin-LL/CalcGPT.io/blob/main/netlify/fu...

    const prompt = `1+1=2\n5-2=3\n2*4=8\n9/3=3\n10/3=3.33333333333\n${math}=`;
    let response: Response;
    try {
      const openAI = new OpenAI();
      response = await openAI.completions
        .create({
          model: "babbage-002",
          temperature,
          top_p: topP,
          stop: "\n",
          prompt,
          stream: true,
        })
      .asResponse();
    } catch (error) {
      return new Response("api error", {
        status: 500,
      });
    }
    return new Response(response.body, {
      headers: {
        "content-type": "text/event-stream",
      },
    });

It's using the old babbage-002 model with a completion (not chat) prompt, which is more readable like this:

    1+1=2
    5-2=3
    2*4=8
    9/3=3
    10/3=3.33333333333
    ${math}=

tzury a year ago

Entered 42

The 8 solutions I got while clicking on regenerate:

    3.33333333333
    42, so the point your talking about is 3.3 (Accuracy is
    3 Additionally, 3 coincided with John 3:16 , "$3
    1
    3.33333333333
    42
    42+1=3+1=4=42+1=43
    2×5

Not so sure what I just did. Results are copy-pasted as-is

layer8 a year ago

I got “41 rotten apples = 4444”.

ducktective a year ago

So ... a javascript interpreter?

Alifatisk a year ago

No?
- azeemba a year ago
  
  I think they might be making a joke about how JavaScript can act surprisingly when `+` operator is used with strings/arrays in combination with numbers
  - Alifatisk a year ago
    
    Oh

anotherhue a year ago

This is amazing. An antidote to the mesmerisation.

Loughla a year ago

I'm taking this to work to show an executive who is desperate to integrate AI into the day to day operations of a college.
- viraptor a year ago
  
  That's silly. You may as well bring a telegraph to show how bad the idea is the internet is.
  There are better, more reasonable arguments against too much AI hype.
- mewpmewp2 a year ago
  
  It is using pre hype old version of GPT. So it is quite dishonest that you would have to use this one to prove a point. It may work as a joke, but the model that the hype is for (GPT4) wouldn't perform that poorly.
  So it is actually evidence in favour of how strong the gap is between pre hype and after.
  This is not the model that caused the hype.

radeeyate a year ago

I love this.

Supposedly 0/0 is zero. Good to know from now on.

hluska a year ago

This is the first time I have come across Calvin Liang, but I’m already a big fan. Their artist’s statement manages to be very funny while making a point. I like today.

Closi a year ago

I think there is a bug here...

8888888×965 = 965 according to this site with temperature = 0 or 3.63... with temperature = 1

On the other hand, GPT4 gets it correct:

https://chatgpt.com/share/34007f39-cfa8-46c8-bda3-9f641affc1...

Even when I instruct it not to think about it:

https://chatgpt.com/share/cb22c9dc-1549-4d00-a498-c889f6822b...

mewpmewp2 a year ago

It is GPT-3 so very out of date model.

jkitching a year ago

+5*9 returned:

((−5(if the finnicky effort to even a decimal number found a different

Finnicky effort indeed ;)

zug_zug a year ago

I'm sorry but this falls flat for me. GPT4 routinely can answer impressive math questions for me (college-level):

- What diameter steel wire would I need to be rated for a weight of 500lbs?

- How many digits would a ID need to be (using 36 characters) to have a 1/10^20 chance of collision over 1 billion random IDs?

- If I have a list of a million times (say durations of a web request) and they follow a normal distribution, and I take a sample of 1 million of those, how close would the average of my .1% sample be to the true average of the billion?

- Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?

QuiDortDine a year ago

It is for sure just a funny hobby project, but your statement had me intrigued:
> Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?
Interestingly, ChatGPT 4o tells me to use 2d19 + 51, even after correcting it and asking for larger dice. Impressive math for sure but not worth much if it doesn't respect constraints. I guess I could try again until it stumbles upon the right answer, but it's all to say it's not quite there yet.
- zug_zug a year ago
  To be fair, I didn't hand-check the answer it gave (and I didn't retype the whole prompt exactly here) - but here's what it gave me [4o model]:
  ... (lots of calculations)
  Final Comparison
  Variance of 20d6: 58.33458.334 Variance of 1d20+1d12+53: 45.166745.1667
  The variance of 1d20+1d12+53 is closer to 58.334 than previous combinations and represents a reasonable approximation for both mean and variance.
  [Edit: Just checked it in google sheets, this looks right to me]
  - QuiDortDine a year ago
    
    Yes, it's technically correct, but you said a larger dice, which a d12 is not :)
    I would be curious to know if the larger dice version is impossible, but then I would also expect it to tell me.
    
    zug_zug a year ago
    
    I'm confused. I consider a d12 a larger die than a d6. Perhaps you're making a pun about physical size of the dice?
    
    QuiDortDine a year ago
    
    Oh wow I actually misread the comment, thought it was 6d20. Ok scratch everything!

j_bum a year ago

I’m enjoying experimenting with nonsensical math:

> Apple * dog

> CalcGPT: Apple Mini − dog or dog. Total= Apple Dog Mini MiniDog=49

I was wondering if math of words would produce the embedding of the operation of those words, but nope :)

aceazzameen a year ago

> Hot + dog
CalcGTP: Three dogs holding a hot dog with tails pointing to infinity
- j_bum a year ago
  
  Nice one!
  I wonder if the creator modified the output, it’s no longer giving me text answers.
  Edit: you just have to keep pressing enter and eventually some text output can be spit out.

mritchie712 a year ago

this is just using a shitty model:

calcgpt:

input: 88×66

output: 5184

chatgpt:

input: 88×66

output: 88 × 66 equals 5808.

jsheard a year ago

When you ask ChatGPT straightfoward math questions it usually solves them by generating and running a Python script which performs the actual calculation on its behalf. That's a reasonable workaround on paper since LLMs are better at doing codegen than math, but the heuristics which trigger that path are brittle so if your query is more complicated it can regress to making the LLM itself try to solve it, with predictably poor results. You can tell when the Python solver is used because there will be a button next to the response which shows you the code it ran.
Wolfram Alpha:
Q: (706458021-342622146)*988457934
A: 359,636,457,317,582,250
ChatGPT:
Q: What is the solution to (706458021-342622146)*988457934
A: The solution to (706458021 - 342622146) x 988457934 is 359,636,457,317,582,250 [Python button]
ChatGPT:
Q: Write a press release announcing a momentous scientific discovery: the solution to (706458021-342622146)*988457934
A: [...] The equation, which involves the subtraction of two large integers followed by multiplication with another large integer, has been resolved to yield a precise result of 359,462,296,091,341,640. The computation was executed with utmost precision, demonstrating the profound capabilities of modern mathematical techniques and computational power. [...] [no Python button]
xanderlewis a year ago

also ChatGPT: 9.11 is bigger than 9.9
- TZubiri a year ago
  
  True for versions
- mritchie712 a year ago
  
  you can probably get it to answer if you try, but I can't
  https://x.com/thisritchie/status/1817615006583738528
- brunocvcunha a year ago
  
  It is bigger. You meant greater?
  - xanderlewis a year ago
    
    I’ve never heard a mathematician object to the use of the phrase ‘bigger than’ to refer to the relation >.

layer8 a year ago

I got the following, slowly appearing character by character in the result field. Due to the slowness, it took a bit to realize it wasn't GPT output.

    <!DOCTYPE html>
    <!−−[if lt IE 7]> <html class="no−jsie6 oldie" lang="en−US"> <![endif]−−>
    <!−−[if IE 7]> <html class="no−js ie7 oldie" lang="en−US"> <![endif]−−>
    <!−−[if IE 8]> <html class="no−js ie8 oldie" lang="en−US"> <![endif]−−>
    <!−−[if gt IE 8]><!−−> <html class="no−js" lang="en−US"> <!−−<![endif]−−>
    <head>


    <title>calcgpt.io | 502: Bad gateway<÷title>
    <meta charset="UTF−8" ÷>
    <meta http−equiv="Content−Type"content="text÷html; charset=UTF−8" ÷>
    <meta http−equiv="X−UA−Compatible" content="IE=Edge" ÷>
    <meta name="robots" content="noindex, nofollow" ÷>
    <meta name="viewport" content="width=device−width,initial−scale=1" ÷>
    <link rel="stylesheet" id="cf_styles−css" href="÷cdn−cgi÷styles÷main.css"÷>


    <÷head>
    <body>
    <div id="cf−wrapper">
    <div id="cf−error−details" class="p−0">
    <header class="mx−auto pt−10 lg:pt−6 lg:px−8 w−240 lg:w−full mb−8">
    <h1 class="inline−block sm:block sm:mb−2 font−light text−60 lg:text−4xl text−black−dark leading−tight mr−2">
    <span class="inline−block">Bad gateway<÷span>
    <span class="code−label">Error code 502<÷span>
    <÷h1>
    <div>
    Visit <a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">cloudflare.com<÷a> for more information.
    <÷div>
    <div class="mt−3">2024−07−2814:37:25 UTC<÷div>
    <÷header>
    <div class="my−8 bg−gradient−gray">
    <div class="w−240 lg:w−full mx−auto">
    <div class="clearfix md:px−8">

    <div id="cf−browser−status" class=" relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400 overflow−hidden float−left md:float−none text−center">
    <div class="relative mb−10 md:m−0">

    <span class="cf−icon−browser block md:hidden h−20 bg−center bg−no−repeat"><÷span>
    <span class="cf−icon−ok w−12 h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>

    <÷div>
    <span class="md:block w−full truncate">You<÷span>
    <h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">

    Browser

    <÷h3>
    <span class="leading−1.3 text−2xltext−green−success">Working<÷span>
    <÷div>

    <div id="cf−cloudflare−status" class=" relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400 overflow−hidden float−left md:float−none text−center">
    <div class="relative mb−10 md:m−0">
    <a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">
    <span class="cf−icon−cloud blockmd:hidden h−20 bg−center bg−no−repeat"><÷span>
    <span class="cf−icon−ok w−12 h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>
    <÷a>
    <÷div>
    <span class="md:block w−full truncate">Newark<÷span>
    <h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">
    <a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">
    Cloudflare
    <÷a>
    <÷h3>
    <span class="leading−1.3 text−2xltext−green−success">Working<÷span>
    <÷div>

    <div id="cf−host−status" class="cf−error−source relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400overflow−hidden float−left md:float−none text−center">
    <div class="relative mb−10 md:m−0">

    <span class="cf−icon−server blockmd:hidden h−20 bg−center bg−no−repeat"><÷span>
    <span class="cf−icon−error w−12h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>

    <÷div>
    <span class="md:block w−full truncate">calcgpt.io<÷span>
    <h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">

    Host

    <÷h3>
    <span class="leading−1.3 text−2xltext−red−error">Error<÷span>
    <÷div>

    <÷div>
    <÷div>
    <÷div>

    <div class="w−240 lg:w−full mx−auto mb−8 lg:px−8">
    <div class="clearfix">
    <div class="w−1÷2 md:w−full float−left pr−6 md:pb−10 md:pr−0leading−relaxed">
    <h2 class="text−3xl font−normal leading−1.3 mb−4">What happened?<÷h2>
    <p>The web server reported a badgateway error.<÷p>
    <÷div>
    <div class="w−1÷2 md:w−full float−left leading−relaxed">
    <h2 class="text−3xl font−normal leading−1.3 mb−4">What can I do?<÷h2>
    <p class="mb−6">Please try againin a few minutes.<÷p>
    <÷div>
    <÷div>
    <÷div>

    <div class="cf−error−footer cf−wrapper w−240 lg:w−full py−10 sm:py−4 sm:px−8 mx−autotext−center sm:text−left border−solid border−0 border−t border−gray−300">
    <p class="text−13">
    <span class="cf−footer−item sm:block sm:mb−1">Cloudflare Ray ID: <strong class="font−semibold">8aa59b671c0a41b4<÷strong><÷span>
    <span class="cf−footer−separatorsm:hidden">&bull;<÷span>
    <span id="cf−footer−item−ip" class="cf−footer−item hidden sm:block sm:mb−1">
    Your IP:
    <button type="button" id="cf−footer−ip−reveal" class="cf−footer−ip−reveal−btn">Click to reveal<÷button>
    <span class="hidden" id="cf−footer−ip">REDACTED<÷span>
    <span class="cf−footer−separatorsm:hidden">&bull;<÷span>
    <÷span>
    <span class="cf−footer−item sm:block sm:mb−1"><span>Performance &amp; security by<÷span> <a rel="noopener noreferrer" href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" id="brand_link" target="_blank">Cloudflare<÷a><÷span>

    <÷p>
    <script>(function(){function d(){var b=a.getElementById("cf−footer−item−ip"),c=a.getElementById("cf−footer−ip−reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf−footer−ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();<÷script>
    <÷div><!−− ÷.error−footer −−>


    <÷div>
    <÷div>
    <÷body>
    <÷html>

@Original author: You may want to fix this. ;)

@Cloudflare: You have a typo there ("againin").

paxys a year ago

This is neat, but most people are going to miss "GPT-3 (babbage-002)". Using a rudimentary, outdated model seems disingenuous when making any kind of point about AI.

mewpmewp2 a year ago

Yeah I would say it actually makes the contrary point. That pre hype version of the GPT is poor and if you have to use this one to prove a point it probably means there is a huge jump between GPT3 and GPT4. So to me it proves the contrary. And anybody going for that or believing it doesn't actually understand the performance of GPT4 or better if they are thinking that this is post hype LLM output.
- pona-a a year ago
  
  Well, what if it just got better at covering up human-presentable cases?
  See this comment [0] on this very post, showing how it makes quite problematic mistakes on larger numbers still.
  It's still improvement, but only in the way of imitation. It shows that while clever within their constraints, these models still don't have the capabilities to truly perform computation or "thought". Chain of thought can help, but you there are some things you cannot split into atomic tasks; if the very world model isn't that stellar, no amount of elucidation will compensate for the inaccurate representations within. (i.e. "How would person X react to Y?" If your theory of mind is poor, no amount of further subtasks will help you give a better prediction.)
  [0] https://news.ycombinator.com/item?id=41092987
  - mewpmewp2 a year ago
    
    For larger numbers it just needs to execute code. Most people also can't calculate such numbers in their head.
    It shouldn't have to be able to do things it knows how to use code for. E.g. dumb thing slike how many Rs in a strawberry. It doesn't even see characters, so even if it was somehow possible, it couldn't count for sure.
    It is like asking someone who only has ever seen hieroglyphs how many Rs are in a character by character version of strawberry.
    
    pona-a a year ago
    
    Still, let's not anthropomorphize computational processes. It is a function approximate, which we'd expect to pick up on simple patterns like intersections or base10 arithmetic. When we see its predictions diverge from truth, that shouldn't be disregarded with a "just so" story, this is a sign we're pushing the architecture to its limits.

valval a year ago

This is about as funny and original as feeding natural language to an actual calculator app and watching it syntax error.

xanderlewis a year ago

Not really; there is some asymmetry. One could at least hope (as many seemingly have) that natural language systems like LLMs could also cope with formal reasoning and calculation, but you’d be an idiot to think it goes the other way.

zhiQ a year ago

AI chatbots differ in their ability to handle long calculations involving single-digit numbers — https://userfriendly.substack.com/p/discover-how-mistral-lar...

Settings

CalcGPT

Keyboard Shortcuts