CalcGPT
calcgpt.ioIt's great to see a _real_ AI application among all this media noise ;-).
Seriously though, this is wonderful satire. I asked 88x10 and it returned an HTML meta tag.
The two sliders at the top are the best. The most customizable calculator to my knowledge.
Cue the comments about criticism of this calculator being unfair as thinking, for example, that 88*10 = 888 is a ‘very human’ mistake to make.
I got 883, which is also very human. They just forgot to write one of the halves of 8
You can only get an 8 in the rightmost digit of the result by multiples of the rightmost digits, but 08 obviously gets you a 0, so fairly easy to see this is wrong.
(10a+b)(10c+d) = 100ac+10(ad+bc)+bd
Well… I was joking. Even more generally, multiplication by b in base b gives a zero at the end.
… and, in fact, ‘b in base b’ always looks like 10 anyway!
> GPT-3 (babbage-002)
I'm surprised babbage is still available via APIs - https://platform.openai.com/docs/models/gpt-base
Anyone else using this?
This neat demo is a year old now, it was first released in July 2023.
Source code and prompt here: https://github.com/Calvin-LL/CalcGPT.io/blob/main/netlify/fu...
const prompt = `1+1=2\n5-2=3\n2*4=8\n9/3=3\n10/3=3.33333333333\n${math}=`;
let response: Response;
try {
const openAI = new OpenAI();
response = await openAI.completions
.create({
model: "babbage-002",
temperature,
top_p: topP,
stop: "\n",
prompt,
stream: true,
})
.asResponse();
} catch (error) {
return new Response("api error", {
status: 500,
});
}
return new Response(response.body, {
headers: {
"content-type": "text/event-stream",
},
});
It's using the old babbage-002 model with a completion (not chat) prompt, which is more readable like this: 1+1=2
5-2=3
2*4=8
9/3=3
10/3=3.33333333333
${math}=Entered 42
The 8 solutions I got while clicking on regenerate:
3.33333333333
42, so the point your talking about is 3.3 (Accuracy is
3 Additionally, 3 coincided with John 3:16 , "$3
1
3.33333333333
42
42+1=3+1=4=42+1=43
2×5
Not so sure what I just did.
Results are copy-pasted as-isI got “41 rotten apples = 4444”.
So ... a javascript interpreter?
No?
I think they might be making a joke about how JavaScript can act surprisingly when `+` operator is used with strings/arrays in combination with numbers
Oh
This is amazing. An antidote to the mesmerisation.
I'm taking this to work to show an executive who is desperate to integrate AI into the day to day operations of a college.
That's silly. You may as well bring a telegraph to show how bad the idea is the internet is.
There are better, more reasonable arguments against too much AI hype.
It is using pre hype old version of GPT. So it is quite dishonest that you would have to use this one to prove a point. It may work as a joke, but the model that the hype is for (GPT4) wouldn't perform that poorly.
So it is actually evidence in favour of how strong the gap is between pre hype and after.
This is not the model that caused the hype.
I love this.
Supposedly 0/0 is zero. Good to know from now on.
This is the first time I have come across Calvin Liang, but I’m already a big fan. Their artist’s statement manages to be very funny while making a point. I like today.
I think there is a bug here...
8888888×965 = 965 according to this site with temperature = 0 or 3.63... with temperature = 1
On the other hand, GPT4 gets it correct:
https://chatgpt.com/share/34007f39-cfa8-46c8-bda3-9f641affc1...
Even when I instruct it not to think about it:
https://chatgpt.com/share/cb22c9dc-1549-4d00-a498-c889f6822b...
It is GPT-3 so very out of date model.
+5*9 returned:
((−5(if the finnicky effort to even a decimal number found a different
Finnicky effort indeed ;)
I'm sorry but this falls flat for me. GPT4 routinely can answer impressive math questions for me (college-level):
- What diameter steel wire would I need to be rated for a weight of 500lbs?
- How many digits would a ID need to be (using 36 characters) to have a 1/10^20 chance of collision over 1 billion random IDs?
- If I have a list of a million times (say durations of a web request) and they follow a normal distribution, and I take a sample of 1 million of those, how close would the average of my .1% sample be to the true average of the billion?
- Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?
It is for sure just a funny hobby project, but your statement had me intrigued:
> Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?
Interestingly, ChatGPT 4o tells me to use 2d19 + 51, even after correcting it and asking for larger dice. Impressive math for sure but not worth much if it doesn't respect constraints. I guess I could try again until it stumbles upon the right answer, but it's all to say it's not quite there yet.
To be fair, I didn't hand-check the answer it gave (and I didn't retype the whole prompt exactly here) - but here's what it gave me [4o model]:
... (lots of calculations)
Final Comparison
The variance of 1d20+1d12+53 is closer to 58.334 than previous combinations and represents a reasonable approximation for both mean and variance.Variance of 20d6: 58.33458.334 Variance of 1d20+1d12+53: 45.166745.1667[Edit: Just checked it in google sheets, this looks right to me]
Yes, it's technically correct, but you said a larger dice, which a d12 is not :)
I would be curious to know if the larger dice version is impossible, but then I would also expect it to tell me.
I'm confused. I consider a d12 a larger die than a d6. Perhaps you're making a pun about physical size of the dice?
Oh wow I actually misread the comment, thought it was 6d20. Ok scratch everything!
I’m enjoying experimenting with nonsensical math:
> Apple * dog
> CalcGPT: Apple Mini − dog or dog. Total= Apple Dog Mini MiniDog=49
I was wondering if math of words would produce the embedding of the operation of those words, but nope :)
> Hot + dog
CalcGTP: Three dogs holding a hot dog with tails pointing to infinity
Nice one!
I wonder if the creator modified the output, it’s no longer giving me text answers.
Edit: you just have to keep pressing enter and eventually some text output can be spit out.
this is just using a shitty model:
calcgpt:
input: 88×66
output: 5184
chatgpt:
input: 88×66
output: 88 × 66 equals 5808.
When you ask ChatGPT straightfoward math questions it usually solves them by generating and running a Python script which performs the actual calculation on its behalf. That's a reasonable workaround on paper since LLMs are better at doing codegen than math, but the heuristics which trigger that path are brittle so if your query is more complicated it can regress to making the LLM itself try to solve it, with predictably poor results. You can tell when the Python solver is used because there will be a button next to the response which shows you the code it ran.
Wolfram Alpha:
Q: (706458021-342622146)*988457934
A: 359,636,457,317,582,250
ChatGPT:
Q: What is the solution to (706458021-342622146)*988457934
A: The solution to (706458021 - 342622146) x 988457934 is 359,636,457,317,582,250 [Python button]
ChatGPT:
Q: Write a press release announcing a momentous scientific discovery: the solution to (706458021-342622146)*988457934
A: [...] The equation, which involves the subtraction of two large integers followed by multiplication with another large integer, has been resolved to yield a precise result of 359,462,296,091,341,640. The computation was executed with utmost precision, demonstrating the profound capabilities of modern mathematical techniques and computational power. [...] [no Python button]
also ChatGPT: 9.11 is bigger than 9.9
True for versions
you can probably get it to answer if you try, but I can't
It is bigger. You meant greater?
I’ve never heard a mathematician object to the use of the phrase ‘bigger than’ to refer to the relation >.
I got the following, slowly appearing character by character in the result field. Due to the slowness, it took a bit to realize it wasn't GPT output.
<!DOCTYPE html>
<!−−[if lt IE 7]> <html class="no−jsie6 oldie" lang="en−US"> <![endif]−−>
<!−−[if IE 7]> <html class="no−js ie7 oldie" lang="en−US"> <![endif]−−>
<!−−[if IE 8]> <html class="no−js ie8 oldie" lang="en−US"> <![endif]−−>
<!−−[if gt IE 8]><!−−> <html class="no−js" lang="en−US"> <!−−<![endif]−−>
<head>
<title>calcgpt.io | 502: Bad gateway<÷title>
<meta charset="UTF−8" ÷>
<meta http−equiv="Content−Type"content="text÷html; charset=UTF−8" ÷>
<meta http−equiv="X−UA−Compatible" content="IE=Edge" ÷>
<meta name="robots" content="noindex, nofollow" ÷>
<meta name="viewport" content="width=device−width,initial−scale=1" ÷>
<link rel="stylesheet" id="cf_styles−css" href="÷cdn−cgi÷styles÷main.css"÷>
<÷head>
<body>
<div id="cf−wrapper">
<div id="cf−error−details" class="p−0">
<header class="mx−auto pt−10 lg:pt−6 lg:px−8 w−240 lg:w−full mb−8">
<h1 class="inline−block sm:block sm:mb−2 font−light text−60 lg:text−4xl text−black−dark leading−tight mr−2">
<span class="inline−block">Bad gateway<÷span>
<span class="code−label">Error code 502<÷span>
<÷h1>
<div>
Visit <a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">cloudflare.com<÷a> for more information.
<÷div>
<div class="mt−3">2024−07−2814:37:25 UTC<÷div>
<÷header>
<div class="my−8 bg−gradient−gray">
<div class="w−240 lg:w−full mx−auto">
<div class="clearfix md:px−8">
<div id="cf−browser−status" class=" relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400 overflow−hidden float−left md:float−none text−center">
<div class="relative mb−10 md:m−0">
<span class="cf−icon−browser block md:hidden h−20 bg−center bg−no−repeat"><÷span>
<span class="cf−icon−ok w−12 h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>
<÷div>
<span class="md:block w−full truncate">You<÷span>
<h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">
Browser
<÷h3>
<span class="leading−1.3 text−2xltext−green−success">Working<÷span>
<÷div>
<div id="cf−cloudflare−status" class=" relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400 overflow−hidden float−left md:float−none text−center">
<div class="relative mb−10 md:m−0">
<a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">
<span class="cf−icon−cloud blockmd:hidden h−20 bg−center bg−no−repeat"><÷span>
<span class="cf−icon−ok w−12 h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>
<÷a>
<÷div>
<span class="md:block w−full truncate">Newark<÷span>
<h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">
<a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">
Cloudflare
<÷a>
<÷h3>
<span class="leading−1.3 text−2xltext−green−success">Working<÷span>
<÷div>
<div id="cf−host−status" class="cf−error−source relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400overflow−hidden float−left md:float−none text−center">
<div class="relative mb−10 md:m−0">
<span class="cf−icon−server blockmd:hidden h−20 bg−center bg−no−repeat"><÷span>
<span class="cf−icon−error w−12h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>
<÷div>
<span class="md:block w−full truncate">calcgpt.io<÷span>
<h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">
Host
<÷h3>
<span class="leading−1.3 text−2xltext−red−error">Error<÷span>
<÷div>
<÷div>
<÷div>
<÷div>
<div class="w−240 lg:w−full mx−auto mb−8 lg:px−8">
<div class="clearfix">
<div class="w−1÷2 md:w−full float−left pr−6 md:pb−10 md:pr−0leading−relaxed">
<h2 class="text−3xl font−normal leading−1.3 mb−4">What happened?<÷h2>
<p>The web server reported a badgateway error.<÷p>
<÷div>
<div class="w−1÷2 md:w−full float−left leading−relaxed">
<h2 class="text−3xl font−normal leading−1.3 mb−4">What can I do?<÷h2>
<p class="mb−6">Please try againin a few minutes.<÷p>
<÷div>
<÷div>
<÷div>
<div class="cf−error−footer cf−wrapper w−240 lg:w−full py−10 sm:py−4 sm:px−8 mx−autotext−center sm:text−left border−solid border−0 border−t border−gray−300">
<p class="text−13">
<span class="cf−footer−item sm:block sm:mb−1">Cloudflare Ray ID: <strong class="font−semibold">8aa59b671c0a41b4<÷strong><÷span>
<span class="cf−footer−separatorsm:hidden">•<÷span>
<span id="cf−footer−item−ip" class="cf−footer−item hidden sm:block sm:mb−1">
Your IP:
<button type="button" id="cf−footer−ip−reveal" class="cf−footer−ip−reveal−btn">Click to reveal<÷button>
<span class="hidden" id="cf−footer−ip">REDACTED<÷span>
<span class="cf−footer−separatorsm:hidden">•<÷span>
<÷span>
<span class="cf−footer−item sm:block sm:mb−1"><span>Performance & security by<÷span> <a rel="noopener noreferrer" href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" id="brand_link" target="_blank">Cloudflare<÷a><÷span>
<÷p>
<script>(function(){function d(){var b=a.getElementById("cf−footer−item−ip"),c=a.getElementById("cf−footer−ip−reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf−footer−ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();<÷script>
<÷div><!−− ÷.error−footer −−>
<÷div>
<÷div>
<÷body>
<÷html>
@Original author: You may want to fix this. ;)@Cloudflare: You have a typo there ("againin").
This is neat, but most people are going to miss "GPT-3 (babbage-002)". Using a rudimentary, outdated model seems disingenuous when making any kind of point about AI.
Yeah I would say it actually makes the contrary point. That pre hype version of the GPT is poor and if you have to use this one to prove a point it probably means there is a huge jump between GPT3 and GPT4. So to me it proves the contrary. And anybody going for that or believing it doesn't actually understand the performance of GPT4 or better if they are thinking that this is post hype LLM output.
Well, what if it just got better at covering up human-presentable cases?
See this comment [0] on this very post, showing how it makes quite problematic mistakes on larger numbers still.
It's still improvement, but only in the way of imitation. It shows that while clever within their constraints, these models still don't have the capabilities to truly perform computation or "thought". Chain of thought can help, but you there are some things you cannot split into atomic tasks; if the very world model isn't that stellar, no amount of elucidation will compensate for the inaccurate representations within. (i.e. "How would person X react to Y?" If your theory of mind is poor, no amount of further subtasks will help you give a better prediction.)
For larger numbers it just needs to execute code. Most people also can't calculate such numbers in their head.
It shouldn't have to be able to do things it knows how to use code for. E.g. dumb thing slike how many Rs in a strawberry. It doesn't even see characters, so even if it was somehow possible, it couldn't count for sure.
It is like asking someone who only has ever seen hieroglyphs how many Rs are in a character by character version of strawberry.
Still, let's not anthropomorphize computational processes. It is a function approximate, which we'd expect to pick up on simple patterns like intersections or base10 arithmetic. When we see its predictions diverge from truth, that shouldn't be disregarded with a "just so" story, this is a sign we're pushing the architecture to its limits.
This is about as funny and original as feeding natural language to an actual calculator app and watching it syntax error.
Not really; there is some asymmetry. One could at least hope (as many seemingly have) that natural language systems like LLMs could also cope with formal reasoning and calculation, but you’d be an idiot to think it goes the other way.
AI chatbots differ in their ability to handle long calculations involving single-digit numbers — https://userfriendly.substack.com/p/discover-how-mistral-lar...