Settings

Theme

GPT-4o with scheduled tasks (jawbone) is available in beta

chatgpt.com

141 points by TheJCDenton a year ago · 140 comments

Reader

imsotiredspacex a year ago

This is the prompt describing the function call parameters:

When calling the automation, you need to provide three main parameters: 1. Title (title): A brief descriptive name for the automation. This helps identify it at a glance. For example, "Check for recent news headlines". 2. Prompt (prompt): The detailed instruction or request you want the automation to follow. For example: "Search for the top 10 headlines from multiple sources, ensuring they are published within the last 48 hours, and provide a summary of any recent Russian military strikes in the Lviv Oblast." 3. Schedule (schedule): This uses the iCalendar (iCal) VEVENT format to specify when the automation should run. For example, if you want it to run every day at 8:30 AM, you might provide:

BEGIN:VEVENT RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT

Optionally, you can also include: • DTSTART (start time): If you have a specific starting point, you can include it. For example:

BEGIN:VEVENT DTSTART:20250115T083000 RRULE:FREQ=DAILY;BYHOUR=8;BYMINUTE=30;BYSECOND=0 END:VEVENT

In summary, the call typically includes: • title (string): A short name. • prompt (string): What you want the automation to do. • schedule (string): The iCal VEVENT defining when it should run.

dmadisetti a year ago

The beta is inconsistently showing (required a few refreshes to get something to show up), but my limited usage of it showed a plethora of issues:

- Assumed UTC instead of EST. Corrected it and it still continued to bork

- Added random time deltas to my asked times (+2, -10 min).

- Couple notifications didn't go off at all

- The one that did go off didn't provide a push notification.

---

On top of that, only usable without search mode. In search mode, it was totally confused and gave me a Forbes article.

Seems half baked to me.

Doing scheduled research behind the scenes or sending a push notification to my phone would be cool, but surprised they thought this was OK for a public beta.

  • gukov a year ago

    You'd think Open AI's dev velocity and quality would be off the charts since they live and breathe "AI." If a company building ChatGPT itself often delivers buggy features then it doesn't bode well for this whole 'AI will eat the world' notion.

    • practice9 a year ago

      Well none of the labs have good frontend or mobile engineers or even infra engineers

      Anthropic is ahead in this because they keep their UIs simplistic so the failure modes are also simple (bad connection)

      OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).

      Find it hilarious and sad that o1-pro just times out thinking on very long or image-intense chats. Need to reload page multiple times after it fails to reply and maybe answer will appear (or not? Or in 5 minutes?). Kinda shows they’re not testing enough and “not eating their own food” and feels like chatgpt 3.5 ui before the redesign

      • lolinder a year ago

        > Anthropic is ahead in this because they keep their UIs simplistic ... OpenAI is just pushing half baked stuff to prod and moving on (GPTs, Canvas).

        What's funny is that OpenAI's Canvas was their attempt to copy Anthropic's Artifacts! So it's not like Anthropic is stagnant and OpenAI is at least shipping, Anthropic is shipping and OpenAI can't even copy them right.

      • jeffgreco a year ago

        It's a good point, Anthropic is being VERY choosy and winds up knocking it out of the park with stuff like Artifacts. Meanwhile their MacOS app is junk, but obviously not a priority.

      • cma a year ago

        > because they keep their UIs simplistic

        How do I edit a sent message in the Claude Android app? It's so simplistic I can't find it.

    • golergka a year ago

      So far, I've found AI to be a great force multiplier in green field, small projects. In a huge corporate codebase, it has the power of advanced refactoring (which doesn't touch more than a handful files at a time) and a CSS wizard.

    • cruffle_duffle a year ago

      According to all the magazines I've been reading, all that is required is to just prompt it with "please fix all of these issues" and give it a bulleted list with a single sentence describing each issue. I mean, it's AI powered and therefore much better than overpaid prima-donna engineers, so obviously it should "just work" and all the problems will get fixed. I'm sure most of the bugs were the result of humans meddling in the AI's brilliant output.

      Right now, in fact, my understanding is OpenAI is using their current LLM's to write the next generation ones which will far surpass anything a developer can currently do. Obviously we'll need to keep management around to tell these things what to do, but the days of being a paid software engineer are numbered.

  • ineedasername a year ago

    When I have it do a search I have to tell it to just get all the info it can in the search but wait for the next request. The I explicitly tell it we’re done searching and to treat the next prompt as a new request but using the new info it found.

    That’s the only way I get it to have a halfway decent brain after a web search. Something about that mode makes it more like a PR drone version of whatever I asked it to search, repeating things verbatim even when I ask for more specifics in follow-up.

  • imsotiredspacex a year ago

    i posted the system prompt part describing the function call; if you read it and adjust your prompt for creating the task it works way better.

  • potatoman22 a year ago

    I'd rather have buggy things now than perfect things in a year.

  • arthurcolle a year ago

    DateTime stuff is generally super annoying to debug. Can't fault them too badly. Adding a scheduler is a key enabling idea for a ton of use cases

    • sensanaty a year ago

      > Can't fault them too badly

      The same company that touts their super hyper advanced AI tool that can do everyone's (except the C-level's, apparently) jobs to the world can't figure out how to make a functional cron job happen? And we're giving them a pass, despite the bajillions of dollars that M$ and VC is funneling their way?

      Quite interesting they wouldn't just throw the "proven to be AGI cause it passes some IQ tests sometimes" tooling at it and be done with it.

      • arthurcolle a year ago

        it would explain the bugs if they used the AI to make the datetime implementation though

    • cbeach a year ago

      Agreed on date/time being a frustrating area of software development.

      But wouldn't a company like OpenAI use a tick-based system in this architecture? i.e. there's an event emitter that ticks every second (or maybe minute), and consumers that operate based on these events in realtime? Obviously things get complicated due to the time consumed by inference models, but if OpenAI knows the task upfront it could make an allowance for the inference time?

      If the logic is event driven and deterministic, it's easy to test and debug, right?

      • singron a year ago

        The original cron was programmed this way, but it has to examine every job every tick to check if it should run, which doesn't scale well. Instead, you predict when the next run for a job will be and insert that into an indexed schedule. Then each tick it checks the front of the schedule in ascending order of timestamps until the remaining jobs are in the future.

        This is also a bad case in terms of queueing theory. Looking at Kingmans equation, the arrival variance is very high (a ton of jobs will run at 00:00 and much fewer at 00:01), and the service time also has pretty high variance. That combo will either require high queue delay variance, low utilization (i.e. over-provosioning), or a sophisticated auto-scaler that aggressively starts and stops instances to anticipate the schedule. Most of the time it's ok to let jobs queue since most use cases don't care if a daily or weekly job is 5 minutes late.

    • dmadisetti a year ago

      Yeah, they're not exactly a scrappy startup- I'd be surprised if they had 0 QA.

      Makes me wonder if they internally have "press releases / Q" as an internal metric to keep up the hype.

ttul a year ago

Amazon had an insane number of people working on just the alarms feature in Alexa when they interviewed me for a position years ago. They had entire teams devoted to the tiniest edge case within the realm of scheduling things with Alexa. This is no doubt one of the biggest use cases in computing: getting your computer to tell you what to do at a given time.

  • qgin a year ago

    Recurring schedules across time zones is an unbelievably maddening thing to implement. At first glance it seems simple, but it gets very weird very quickly.

    • uncomplexity_ a year ago

      this.

      some people cant even wrap gheir heads around it, taking hours and hours of discussions. still my favourite problem though.

    • wkat4242 a year ago

      Yeah summer time in different countries switching on different days and often in a different direction (other hemisphere). I used to work on such matters and those weeks were the toughest.

      • ethbr1 a year ago

        Developers when they first start working with time across timezones: "This is a technical problem."

        Developers after more research: "Oh... this is a political problem."

  • echeese a year ago

    Considering my iPhone alarm still sometimes fails to go off (it just shows the alarm screen silently), I'd be inclined to believe you.

    • ineedasername a year ago

      Thanks for that— I though I was going crazy (well still could be I guess) or had some strange habit or gesture I didn’t realize was silencing the alarm somehow.

    • emptiestplace a year ago
      • yakz a year ago

        Whenever I have to wake for something that I absolutely can’t miss, I set 2-3 extra reminders 5 minutes apart precisely because of this “silent alarm” bug. It’s only happened to me a couple of times but twice was enough to completely destroy my trust in the alarm. The first time I thought I just did something in my sleep to cause it, but the UI shows it as if the alarm worked. I’m lucky to have the privilege that if I oversleep an hour or so it’s no big deal, otherwise ye olde tabletop alarm clock would be back.

        • emptiestplace a year ago

          I love the questioning my sanity before I've completely opened my eyes part. It's like a jump start to my day.

        • pedroslopez a year ago

          Hah - I also just assumed that I was turning the alarm off in my sleep without noticing. I started doubting it and really wish there was a log of when you tapped snooze or stopped the alarm...

          This is too much of a dev feature for apple to implement and there are probably third party apps that do this, but meh

  • paul7986 a year ago

    Open AI just needs to create & release their own phone with Microsoft's help! H.E.R. the movie phone.

    Apple has not innovated in years and a GPT Phone where your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you would give Apple a run for it's money! Pick up your phone & see your agent waiting to assist & it could be skinned to look like a deceased loved one (mom still guiding your through life).

    To get things done it would interface with other AI Agents of businesses, companies, your doctor, friends & family to schedule things & used as a knowledgebase.

    Maybe this is their step towards creating said agents?

    • elicksaur a year ago

      > your lock screen is a FaceTime call like UI/UX with your AI Agent who does everything for you

      I just… don’t want this. I don’t think anyone I know wants this.

      • paul7986 a year ago

        Cool, thanks for the comment!

        I use chatGPT now for almost everything and when in the car have full back n forth conversations to get things done (or as a knowledge-base) there too. Recently i was discussing with it how do i properly get rid (junk) my old car in Pennsylvania. It provided all the steps and gave me local businesses. Though it didn't call them or interface with them to find their available times/costs, tell me such details & have me instruct it to schedule my preferred choice. Which i wish it did and prompted thoughts how it could do so, as technology that gets adopted mostly is tech that has simplified our lives.

        I think my concept above is similar to what was seen in the movie H.E.R. (Joaquin Phoenix & Scarlett Johansson starred) so it's not that crazy or odd. Throwing in skinning it to be whoever like a deceased loved one, might to probably is.

        • elicksaur a year ago

          I wouldn’t want my grandmother “skinned” as a bland amalgamation of all of the internet’s text. That’s fucking horrifying.

          H.E.R. is a deeply unsettling movie.

  • android521 a year ago

    And gmail schedule delivery just won't work if you want to email yourself a month later.

cbeach a year ago

I'm sure it's brilliant, but I have no idea what it's capable of. What will it do? Send me a push notification? Have an answer waiting for me when I come back to it in a while?

I switched over to the "GPT4o with scheduled tasks" model and there were no UI hints as to how I might use the feature. So I asked it "what you can you follow up later on and how?"

It replied "Could you clarify what specifically you’d like me to follow up on later?"

This is a truly awful way to launch a new product.

  • benaduggan a year ago

    After asking it to schedule something, it prompted me to allow or block notifications, so sounds like this is just chatGPT scheduling push notifications? We'll see!

    • jerpint a year ago

      So basically canibalizing Siri ?

      • 1propionyl a year ago

        Siri has access to a wealth of private existing and future on-device APIs to fuel context sensitive responses to queries on vendor locked devices used all day long. (Which Apple has apparently decided to just not use yet.)

        OpenAI doesn't, they just have a ton of funding and (up to recently) a good mass media story, and the best natural language responses.

        The moat around Siri is much deeper, and I don't really see any evidence OpenAI has any special sauce that can't be reproduced by others.

        My prediction is that OpenAI's reliance on AI doomerism to generate a regulatory moat falters as they become unable to produce step changes in new models, while Apple's efforts despite being halting and incomplete become ubiquitous thanks to market share and access to on device context.

        I wouldn't (and don't) put my money in OpenAI anymore. I don't see a future for them beyond being the first mover in an "LLM as a service" space in which they have no moat. On top of that they've managed to absorb the worst of criticism as a sort of LLM lightning rod. Worst of all, it may turn out that off-device isn't even really necessary for most consumer applications in which case they'll start to have to make most of their money on corporate contracts.

        Maybe something will change, but right now OpenAI is looking like a bubble company with no guarantee to its dominant position. Because it is what it is: simply the largest pooling of money to try to corner this market. What else do they have?

        • secfirstmd a year ago

          I think there is an argument that currently Google Gemini is best place to tie everything together. Assuming Google executes on it well.

          Most people use Gmail, Docs, Google Maps, Google Calendar above Apples alternatives. Gemini could really tie them up well.

          • 1propionyl a year ago

            The counter argument is that Google doesn't maintain any of those services beyond the bare minimum for customer facing interactions, and exchanges between their services are even more poorly supported if they even exist at all.

            Remember Google Sheets (already the Tonka Toys of spreadsheets) adding named tables to Sheets?

            You can't use them in any of the AppsScript APIs. You have to fall back to manual searching for strings and index arithmetic.

            Google Drive still barely supports anything like moving an entire folder to another folder.

            They have failed at least a half dozen times now to deliver a functional chat/VOIP app after they already had one in Google Talk.

            They regularly sunset products that actually have devoted and zealous user bases for indiscernible reasons.

            Android is just chugging along doing nothing interesting and still carrying the same baggage it did before. It's a painful platform to develop for and the Jetpack Compose/Kotlin shift hasn't ameliorated much of that at all.

            Their search offering is now worse than Bing, worse than Kagi, and worse than some of the LLM based tools that use their index. It's increasingly common that you can't even find a single link that you know an entire verbatim sentence from via Google search for inexplicable reasons. Exact keyword or phrase searches no longer work. You can't require a keyword in results.

            I don't trust Google to deliver a single functional software product at this point, let alone a compelling integration of many different ones developed in different siloes.

            About the only thing going for them is how many people still have Gmail accounts from that initial invite only and generous limits campaign... 20 years ago?

            Google is not a healthy company. I don't invest in them anymore, and barring some major change I probably won't again. It's a dying blue chip which is a terrible position to have your money in.

            P.S. oh, and Gemini is awful by comparison in both price and quality to competitors. It isn't saving them. It's just a "me too".

            P.P.S. I'm personally just waiting for their next "game changing" announcement bound to fail to get in at the top floor on shorting what stock I have. It's one of those cases where finance has rose coloured glasses based on brand name that anyone who's used Google products for years would be thoroughly disabused of.

            • esafak a year ago

              Gemini 2.0 is not bad in quality, and great in terms of speed.

          • jerpint a year ago

            There are so many opportunities for google to improve their services.

            For example, I found myself asking Claude about places to see in a city I’m visiting while switching back and forth to gmaps. This would have been a much better experience integrated directly with gmaps knowledge graph

  • siva7 a year ago

    Yep, this is a truly bad feature launch. I have no clue what this model does. Did they somehow lose their competent product people?

  • cbeach a year ago

    Ah, I've just stumbled on some hints after clicking around.. click on your avatar image (top right) and then click "Tasks"

    Then there are some UI hints.

    "Remind me of your mom's birthday on [X] date"

    Wow, really maximising that $10bn GPU investment!

    • danpalmer a year ago

      Glad to see that the thriving 2010 market of TODO list apps will see a resurgence in the AI era.

      • delichon a year ago

        A todo app that you can write and modify by editing a natural language prompt, and that can parse inputs from the whole web with flexibility and nuance, is not a small thing.

        • danpalmer a year ago

          That also seems to not get timezones right, has a confusing search function...?

          More seriously, todo apps are about productivity, not just about becoming a huge bucket of tasks. I've always found that the productivity comes from getting context out of my head and scheduled for the right time. This release appears to be more about that big bag of tasks and less about productivity. I'm all for AI in products, I think it can be powerful, but I've not had a use-case for it in my todo app.

        • frontalier a year ago

          > a todo app that you can write and modify by editing a natural language prompt

          no.

          "a todo app that you can interact with by writing natural language input?"

          okay.

          > nuance

          really?!

          • delichon a year ago

            I've got about six apps written by Claude from prompts, all quite simple but useful. If you don't believe it I get it, because I didn't either until I tried it.

            As for nuance, I've seen an astounding amount of divergent context incorporated into LLM responses. Not always, but far more than I've ever been able to encode into a parsing script, which is exactly nothing not explicitly programmed.

  • prettyblocks a year ago

    It could get really interesting if they allow webhooks and structured output

  • sandspar a year ago

    Maybe it's effective at hitting a goal which you do not see.

PittleyDunkin a year ago

Where are the release notes?

Edit: I suppose they'll be here at some point: https://help.openai.com/en/articles/9624314-model-release-no...

These seem like extremely shitty release notes. I have no clue why anybody pays for this model.

sky2224 a year ago

Pretty useless so far. I'm not sure what the intended application of this is so far, but I wanted it to schedule some work for me.

It only scheduled the first thing and that was after having to be specific by saying "7:30pm-11pm". I wanted to say "from now to 11pm" but it did couldn't process "now"

  • sandspar a year ago

    If you find a tool useless then it's likely that you lack imagination.

    • sky2224 a year ago

      Okay, let's say I do lack imagination: please enlighten me after you've had a chance to actually use this half-baked feature.

mulmboy a year ago

https://www.theverge.com/2025/1/14/24343528/openai-chatgpt-r...

phgn a year ago

What am I supposed to see at the link?

elyase a year ago

There is more information in these twitter threads:

https://x.com/karinanguyen_/status/1879270529066262733 https://x.com/OpenAI/status/1879267276291203329

encoderer a year ago

Founder of Cronitor.io here — if you’re a developer considering using this, would it be valuable for you to be able to report in to Cronitor when it runs so we can keep an eye and alert you if your tasks are late, skipped or accidentally deleted?

We support just about every other job platform but I’d love to hear from potential users before I hack something together.

simple10 a year ago

The UI is different in the desktop app for macOS. The ability to edit the schedule task is only available in the web UI for me.

I got the best results by not enabling Search the Web when I was trying to create tasks. It confuses the model. But scheduled tasks can successfully search the web.

It's flaky, but looks promising!

  • throwaway314155 a year ago

    Less relevant but why isn't canvas available in the desktop app? I thought they had feature parity but it seems not.

nycdatasci a year ago

Lots of complaints mentioned here. If you have a legitimate need for a product like Tasks that is more fully baked, I’d encourage you to check out lindy.ai (no affiliation). I’ve been using it to send periodic email updates on specific topics and it works flawlessly.

luke-stanley a year ago

Whoops! Might have been built wrong? I'm seeing a source map error: Source map error: Error: request failed with status 404 Stack in the worker:networkRequest@resource://devtools/client/shared/source-map-loader/utils/network-request.js:43:9

Resource URL: https://cdn.oaistatic.com/assets/jbl0aowda306m4s1.js Source Map URL: jbl0aowda306m4s1.js.map

Also I am getting`Unable to display this message due to an error.`a lot.

kgeist a year ago

So I opened "gpt4o with scheduled tasks" in the mobile app and there was no hint in the UI how to use it. I asked, "what's a scheduled task" and it answered with a generic response about scheduled tasks in general. Then I tried my luck and said, "remind me to pet my cat in 5 minutes," and it seemed to work. I then closed the mobile app, but no push notification came after 5 minutes, however I got an email, which I didn't expect (I expected push notifications). Clearly the feature needs more polish.

abrichr a year ago

It seems there are some issues with the rollout.

Me:

> Give me positive feedback every hour

ChatGPT:

> Provide positive feedback

> Next run Jan 15, 2025

> Got it! I’ll send you positive feedback every hour.

An hour later, I received the following email:

```

Your scheduled task couldn't be completed

ChatGPT tried to complete Provide positive feedback multiple times, but it encountered an error and wasn't able to send. It will try again the next time this task is scheduled.

Open chat If you have any questions, please contact through the help center.

All the best, ChatGPT

```

halamadrid a year ago

This is interesting, although I am a little confused about the purpose of ChatGPT with this feature.

We already have many implementations where at a cron interval one could call the GPT APIs for stuff. And its nice to monitor it and see how things are working etc.

So I am curious whats the use case to embed a schedule inside the ChatGPT infrastructure. Seems like a little off its true purpose?

  • runeblaze a year ago

    I think we all agree this feature seems broadly of use, and given that presumably a professional full-time 1P team was behind this feature I am gonna use this product over the other implementations

  • sandspar a year ago

    It's for normies.

TheJCDentonOP a year ago

There is an editable tasks list and in the settings menu you can choose to receive notifications via push and/or email.

sandspar a year ago

It's a tech demo to get normies used to the idea of agents. HackerNews "20 years in industry" guys are flabbergasted because it defaults to UTC so is therefore totally useless, clearly. Perhaps you live in a bubble?

serjester a year ago

This seems like such a strange product decision - why clutter the interface with such a niche use case? I’m trying to imagine OpenAI’s reasoning - a new angle on long term memory maybe? Or a potential interface for their agents?

reversethread a year ago

Does the world need another reminder/todo app?

Many existing apps (like Todoist) have already had LLM integrations for a while now, and have more features like calendars and syncing.

Or do I completely not understand what this product is trying to be?

  • bogdan a year ago

    Why not? I already pay for chatgpt but I don't pay for todoist so that doesn't help me.

frontalier a year ago

how does it handle timezones?

i saw no mention of them on the help article, or the ui

if i ask for a daily early morning news summary will it show up in the middle of the night or around lunch time? will it get updated when i travel? seems interesting if what you're looking for is a reminder that is not time relevant, just a thing that should happen at some point with a time precision of about 1 day.

https://help.openai.com/en/articles/10291617-scheduled-tasks...

zb3 a year ago

The link doesn't work, presumably because I won't pay OpenAI which stole my API credits by making them have an "expiration date".

throwaway314155 a year ago

This is shaping up to be as bad as the Sora release.

krishadi a year ago

For those unable to find this, you can find it as a new model in the model drop-down menu.

rfdearborn a year ago

These are best understood as scheduled tasks for the AI instead of tasks for the user.

krishadi a year ago

The biggest outcome here is that now the app has memory.

picografix a year ago

why are they trying to be a model provider as well as service provider

  • rlt a year ago

    Why wouldn’t they? Most big tech cos offer products at multiple layers of the stack.

golergka a year ago

Couldn't you do the same with giving an LLM access to your shell and a cron command?

reverseblade2 a year ago

lol. I was a going to build this. Even purchased the domain alert.now I even have news based version active implementation at alarms.global if you install it to your phone as PWA you get push notifications when something important happens in your region or can notify you before public holidays

I even have an automated x account @alarmsglobal

geepytee a year ago

Imagine being an engineer on the Siri team, must be so demoralizing.

sagarpatil a year ago

A glorified reminder? Really?

mempko a year ago

Sorry, I simply cannot use OpenAI because it's leadership is kissing the ring of Trump.

  • ablation a year ago

    Friend, I've got some news about the leadership of the majority of tech services you will use over the next 4-8 years...

ldjkfkdsjnv a year ago

This is going to eat software, and is the beginning of agents. The orchestrator of these tasks will come, and OpenAI will turn into a general purpose compute system, the endgame of workflow software. Soon there will be a database, and your prompts will be able to directly read and write to an openai hosted postgres instance. And your CRUD app will begin to disappear. Programming will feel pointless

  • rglover a year ago

    Possibly, but that's going to require 100% consistent, accurate outputs (tricky as that's not the nature of LLMs).

    Otherwise, you'll have a lot of systems dependent on these orchestrators creating hard-to-debug mistakes up and down the pipeline. With software, you can reach a state where it does what you tell it to without having to worry if some model adjustment or API change is going to break the output.

    If they solve that, then yes. Otherwise, what I personally expect is a lot of businesses rushing into implementing "agents" only to backpedal later when they start to have negative material effects on bottom lines.

    • ldjkfkdsjnv a year ago

      Its inevitable. You can argue about what's possible right now, but I'm not looking at it from that angle. I think these issues will be solved with time

      • ryan93 a year ago

        They are using infinity compute and can’t do simple notifications. How will changing the architecture slightly or ingesting more data change that?

      • rglover a year ago

        That belief is at odds with the mechanics of how LLMs work. It's not a question of more effort/investment/compute/whatever, it's just a reality of how the underlying systems work (non-deterministic). If you can find a way to make the context window on the scale of the human brain, you may be able to mostly mitigate this.

        People want us to be at "Her" levels of AI, but we're at a far earlier stage. We can fake certain aspects of that (using TTS), but blindly trusting an AI to run everything is going to be a big mistake in the short-term. And in order for the inevitability of what you describe to take place, the predecessor(s) to that have to work in a way that doesn't scare people and businesses away.

        The plowing of money and hype into the current forms of AI (not to mention the gaslighting about their ability) makes me think the real inevitability is a meltdown in the next 5-10 years which leads to AI-hesitancy on a mass scale.

        • ldjkfkdsjnv a year ago

          Have you tried o1 pro? I find people that are making these assertions are not deeply using the models on a daily basis. With each new release, I can see the increase of capability, and can do things. I have written software in the last year that is at a level of complexity beyond my skill set. I have 15 years of SWE experience, most at FAANG. You just arent close enough to the metal to see what's coming. It's not about what we have now, its about scaling and a reliable march of model improvements. The code has been cracked, given sufficient data, anything can be learned. Neural networks are generalized learners

          • rglover a year ago

            Yes, I use LLMs every day. Primarily for coding (a mix of Claude and OAI). I was trying to implement a simple CSS optimization step to my JS framework's build system last night and both kept hallucinating to the point (literally inventing non-existent APIs and config patterns) where I gave up and just did it by hand w/ Google and browsing docs.

            The problem with your "close to the metal" assertion is that this has been parroted about every iteration of LLMs thus far. They've certainly gotten better (impressively so), but again, it doesn't matter. By their very nature (whether today or ten years from now), they're a big risk at the business level which is ultimately where the rubber has to hit the road.

            • wkat4242 a year ago

              Yeah I don't think we're going to come closer to a real AGI until we manage to make a model that can actually understand and think. An LLM sounds smart but it just picks the most likely response from the echoes of a billion human voices. I'm sure we'll get there but not with this tech. I'm pretty sure even be OpenAI said this with their 5 steps to AGI, LLMs were only step 1. And probably the part that will do the talking in the final AI but not the thinking.

              At the moment people are so wooed by the confidence of current LLMs that they forget that there's all sorts of types of AI models. I think the key is going to be to have them work together, each doing the part they're good at.

              • rlt a year ago

                > An LLM sounds smart but it just picks the most likely response from the echoes of a billion human voices.

                This is where reasoning models come in. Train models on many logical statements then give them enough time to produce a chain of thoughts that’s indistinguishable from “understanding” and “thinking”.

                I’m not sure why this leap is so hard for some people to make.

                • wkat4242 a year ago

                  I personally don't think that will go very far. It's just a way of extracting a little bit more out of a technology that's the wrong one for the purpose.

          • harvodex a year ago

            We just are not on your level of genius to understand these things.

            So obviously completely full of shit.

        • ben_w a year ago

          Broadly I agree with your position, but:

          > If you can find a way to make the context window on the scale of the human brain, you may be able to mostly mitigate this.

          Human brains have a much smaller context window than AI do. We can't pay attention to the last 128,000 concepts that filtered past our sensory systems — our conscious considerations are for about seven things.

          There's a lot of stuff that we don't yet understand well enough to reproduce with AI, but context length is the wrong criticism for these models.

          • rglover a year ago

            > context length is the wrong criticism for these models

            You're right. What I'm getting at is the overall speed, efficiency, and accuracy of the storage, retrieval, and processing capability of the human brain.

            • wkat4242 a year ago

              It's kinda crazy that it can run on a few slices of bread when LLMs need kilowatts of power to write a simple paragraph :)

      • potatoman22 a year ago

        Why? Past progress =/= equal rate of future progress.

  • worldsayshi a year ago

    Sure but do they have a moat here? Anyone that can connect to an LLM could make that app.

    • zb3 a year ago

      Yes, they have the name "ChatGPT". For non-technical people this appears to be the most important thing.

      • nozzlegear a year ago

        Is it a household name? Anecdotally, only two of my five millennial/gen-z siblings use an AI app at all, and one of them calls her's "Gary" instead of ChatGPT. I'd be interested in seeing some actual data showing how much ChatGPT is an actual household name versus one that us technical people assume is a household name due to its ubiquity in our space.

        • ben_w a year ago

          > Is it a household name?

          I think it is, yes.

          It was interviewed under that name on one of the UK's main news broadcasts almost immediately after it came out. Few hundred million users. Anecdotes about teachers whose students use it to cheat.

          But who knows. I was surprising people about the existence of Wikipedia as late as 2004, and Google Translate's augmented reality mode some time around the start of the pandemic.

    • ldjkfkdsjnv a year ago

      Does AWS have a moat on cloud computing?

      • scarface_74 a year ago

        Yes, it would take 10s of billions of dollars to recreate the infrastructure as far as servers and AWS has its own pipelines running under the oceans.

        Then you have to recreate all of the services on top of the AWS.

        Then you have to deal with regulations and certifications.

        Then you have to convince decision makers to go against their own interests. “No one ever got fired for Amazon”.

        Then you have to convince corporations to spend money to migrate.

      • worldsayshi a year ago

        Yes that requires huge infrastructure investments. Creating an LLM requires huge investments. Running an LLM requires medium to big investments but using one remotely require very little investment.

  • daveguy a year ago

    This significantly overestimates the reliability of LLMs -- both their output integrity and their ability to understand context.

  • throwaway314155 a year ago

    Bit of advice: you might want to actually use an offering before claiming it is revolutionary.

    • ldjkfkdsjnv a year ago

      I've got 15 years of engineering experience, worked on some of the largest distributed systems at FAANG. Its coming

      • scarface_74 a year ago

        > worked on some of the largest distributed systems at FAANG.

        As have 10s of thousands of other people who could invert a btree on the whiteboard….

      • throwaway314155 a year ago

        Oh wow good for you! Didn't realize you were a prodigy or that this was a contest. I take it all back. /s

        Maybe try some humility. You're not helping yourself with the bragging about frankly underwhelming and common (here) experience.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection