Settings

Theme

Show HN: YakGPT – A locally running, hands-free ChatGPT UI

yakgpt.vercel.app

287 points by kami8845 3 years ago · 122 comments · 1 min read

Reader

Greetings!

YakGPT is a simple, frontend-only, ChatGPT UI you can use to either chat normally, or, more excitingly, use your mic + OpenAI's Whisper API to chat hands-free.

Some features:

* A few fun characters pre-installed

* No tracking or analytics, OpenAI is the only thing it calls out to

* Optimized for mobile use via hands-free mode and cross-platform compressed audio recording

* Your API key and chat history are stored in browser local storage only

* Open-source, you can either use the deployed version at Vercel, or run it locally

Planned features:

* Integrate Eleven Labs & other TTS services to enable full hands-free conversation

* Implement LangChain and/or plugins

* Integrate more ASR services that allow for streaming

Source code: https://github.com/yakGPT/yakGPT

I’d love for you to try it out and hear your feedback!

jwarden 3 years ago

Nice. It took about a minute to clone it, run it, enter my API key, and get started. The speech-to-text worked flawlessly.

Most people can talk faster than they can type, but they can read faster than other people can talk. So an interface where I speak but read the response is an ideal way of interfacing with ChatGPT.

What would be nice is if I didn't have to press the mic button to speak -- if it could just tell when I was speaking (perhaps by saying "hey YakGPT"). But I see how that might be hard to implement.

Would love to hook this up to some smart glasses with a heads-up display where I could speak and read the response.

  • anonzzzies 3 years ago

    > Most people can talk faster than they can type

    Most people I know type faster than they can talk. Also more accurate. I find talking a horrible interface to a computer while sitting down. On the move it is another story entirely of course.

    By the way, chatgpt is not very fast either, so usually I type something in the chat and continue working while it generates the response.

    > smart glasses

    I just tried that; it works quite well, however, pressing the mic button kind of messes up that experience.

    • chenxi9649 3 years ago

      Normal/average talking is ~150 WPM. Average typing speed is about 60-70. Is a 150+WPM a requirement to become anonzzies' friend?

      • MikePlacid 3 years ago

        The only person that could type as fast as I can speak and whom I met in real life was an immigration officer taking my naturalization interview. The sound of a keyboard going: trrr-trrr! And he was amazingly accurate too: all unnecessary things that I said for conversation sake were there, and exactly as I said them. But I think my wife would beat him easily though…

      • quickthrower2 3 years ago

        Or a really slow talker?

        High WPM might be achievable with shorthand though.

    • thelittleone 3 years ago

      The advantage of course is your not tied to a keyboard / desk. So one could potentially be doing Internet research while hiking.

  • xupybd 3 years ago

    It wasn't so smooth for me.

    I gave up at

    Creating an optimized production build ...TypeError: Cannot read properties of null (reading 'useRef')

    • johnchristopher 3 years ago

      Oh, my install failed at:

          Failed to compile.
      
          pages/index.tsx
          `next/font` error:
          Failed to fetch `Inter` from Google Fonts.
      
      
          > Build failed because of webpack errors
      
      Apparently because it can't fetch a font from Google. There should be assets that are critical (js/ts code, templates,css) and assets that are not (freaking fonts) to a yarn build.

      edit: hacketyfixey, let's punch the thing in the face until it works:

          ./pages/index.tsx:
          2:  // import { Inter } from "next/font/google";
          12: // const inter = Inter({ subsets: ["latin"] });
      
      (I am sorry)
      • kami8845OP 3 years ago

        Haha, I'll set up a docker image that people can pull down!

        • johnchristopher 3 years ago

          Thanks but FWIW, I'd also be interested in why it doesn't build. Shouldn't yarn/npm/gulp/whatever manage dependencies ?

          • xupybd 3 years ago

            I've not found a dependency manager that works reliably across multiple operating systems and operating system versions.

JimmyRuska 3 years ago

I tried it, it looks good! I had to modify the code to accept 8000 tokens for chatGPT. It would be good if it saved the json payload of the responses as well.

It uses 2 external calls to a javascript CDN for the microphone package and something else. It would probably be best if it was localhost calls only since it handles an API key

FriedPickles 3 years ago

I love the concept of this and other alternate ChatGPT UIs, but I hesitate to use them and pay for my calls when I could use chat.openai.com for free.

Any chance you could integrate the backend-api, and let me paste in my Bearer token from there?

  • kami8845OP 3 years ago

    Hey! I definitely understand the reservation. This is definitely me as well. My reasons for using the UI at this point:

    * GPT-4 is decently faster when talking straight to the API

    * The API is so stupidly cheap that it's basically a rounding error for me. Half an hour of chatting to GPT3.5 costs me $0.02

    Would be curious what you mean by integrating the backend-api?

    • qwertox 3 years ago

      GPT-3.5 is really cheap (prompt and completion = $0.002 / 1K tokens), but GPT-4 is around 20 times more expensive (prompt = $0.03 / 1K tokens + completion = $0.06 / 1K tokens).

      But the benefit from using the API is that you can change the model on the fly, so you chat with 3.5 until you notice that it's not responding properly and, with all the history you have (probably stored in your database), you can send a bigger request with a probably better response once with GPT-4 as the selected model.

      I really wish the interface on chat.openai.org would allow me to switch between models in the same conversation in order to 1) not use up your quota of GPT-4 interactions per 3 hours as quickly and 2) not strain the backend unnecessarily when you know that starting a conversation with GPT-3.5 is efficient enough until you notice that you better switch models.

      OpenAI already has this implemented: When you use up your quota of GPT-4 chats, it offers you to drop down into GPT-3.5 in that same conversation.

    • robopsychology 3 years ago

      How is it that cheap?! I ran three queries on langchain yesterday with two ConstitionalPrompts and it cost $0.22 - made me realize deploying my project for cheap could be expensive quick.

      • kami8845OP 3 years ago

        GPT3.5 Turbo pricing is 10k tokens or ~7500 words for $0.02. Though note that every API request includes the entire chat context and charges for input & output tokens. https://openai.com/pricing

      • monkmartinez 3 years ago

        You need to check which model you are using, also... LangChain runs through the model several times with increased token count on each successive call.

        • robopsychology 3 years ago

          Yeah I assumed it would be doing several times but still more expensive than OP mentioned. I think the issue is I'm using davinci-003

          • drusepth 3 years ago

            Yeah, davinci-003 is gonna be gpt3, which is more expensive than 3.5.

            One more anecdote: I've been running a half dozen gpt3.5 IRC bots for a few weeks and their total cost was less than a dollar. A few hours of playing around with LangChain on gpt3 cost me almost $4 before I realized I needed to switch to 3.5, though even then it still uses a ton of tokens every chain.

    • agotterer 3 years ago

      I’d love to see a comparison of the average cost to use this with the OpenAI API versus subscribing to chat-gpt plus.

      Maybe I’ll have to try this for a month and see if it end up costing more than $20. Thanks for creating it!

    • joenot443 3 years ago

      Wow! Is it really that cheap? GPT4 is much more expensive, I imagine?

      • kami8845OP 3 years ago

        GPT-4 is decently more expensive -- I personally really like & use the therapist character a lot. In this scenario the session would cost me less than $1 which is still much cheaper than any therapist I've used previously :)

  • 1xdevloper 3 years ago

    You can try the extension I built [0] which uses your existing ChatGPT session to send requests.

    [0] https://sublimegpt.com

    • unitg 3 years ago

      The overlay option is great .. Any chance for a firefox version?

  • Karunamon 3 years ago

    Remember that using the API comes with privacy guarantees that using the chatGPT site does not. tldr; anything sent through the API won't be used to train the model and will be deleted after a month.

    https://help.openai.com/en/articles/5722486-how-your-data-is...

teawrecks 3 years ago

> Run locally on browser – no need to install any applications

That's not what "run locally" means. This isn't any more "local" than talking to chatgpt directly, which is never running locally.

  • kami8845OP 3 years ago

    Hey, run locally in this case means: YakGPT has no backend. Whether you use the react app through https://yakgpt.vercel.app/ or run it on your own machine, I store none of your data. I will try and make this wording clearer!

    • NBJack 3 years ago

      In that case you're basically offering a browser-based client. 'Locally' strongly suggests this is running entirely on the machine (vs. making API calls). Going to break a lot of hearts out there with the wording as it is.

  • rafael09ed 3 years ago

    It is more local than talking to chat GPT directly. Open AI stores all your requests on their server. This saves it on your computer. The title also claims it's a UI which always, for now, runs locally.

blairanderson 3 years ago

Honestly your "idea generator" blew my mind. Would love to see a section that includes a larger catalog of prefilled prompts.

I'm thinking: What would a GPT project manager do? What would a GPT money manager do? What would a GPT logistics manager do? GPT Data Analyst, Etc.

meghan_rain 3 years ago

> Run locally on browser – no need to install any applications

> Please enter your OpenAI key

...

Do people just not get it?

I would in fact rather give all my company secrets to this random dude than OpenAI.

  • iib 3 years ago

    There are instructions on how to run the GUI from localhost, and the title and even the phrase that has the link to their own hosting tell you you can run it locally first.

    It seems they are genuine, and they phrase it exactly as it is. The only thing I would have maybe wanted to see in the title is "open-source" or free software.

asow92 3 years ago

Love the idea of prompt dictation. Taking that idea a step further, would it possible to have a feature where ChatGPT responses are spoken back to the user?

smusamashah 3 years ago

This is fast. And talking to it is a nice touch. Consider adding text to speech too :)

One feature I am missing from all these front ends is the ability to edit your text and generate new response from that point. Official chat gpt UI is the only one that seems to do that.

  • danielbln 3 years ago

    Chat-with-gpt has that, we use it in our org as an alternative chatgpt Interface: https://github.com/cogentapps/chat-with-gpt

    • smusamashah 3 years ago

      In official UI, if you edit a message and get a new response, you can still always go back to any of your previous messages and continue from there on. Basically the history is like a tree in official UI. History in all other frontends including this one is linear.

    • ilovepuppies 3 years ago

      I've never seen this one before. It has several features I've been looking for. Has it been working well for your organization?

      • danielbln 3 years ago

        It has, especially since we don't want to go through the accounting nightmare of buying everyone ChatGPT+ accounts, so just inviting everyone to the OpenAI org and giving out API keys to be used in tools like this one has been good.

    • tluyben2 3 years ago

      I added whisper to that (was merged) so you can talk to it as well.

      • smusamashah 3 years ago

        In official UI the chat history is like a tree. If you edit a message, it branches off the conversation from that point. You can always go back to any message in the tree and see the conversation from there on. Can you do that in your UI? No UI has done that so far.

        • tluyben2 3 years ago

          I am not the author , just a contributor, but it would not be very hard to add.

  • kami8845OP 3 years ago

    Hey! You can edit past messages you've submitted and they will generate a new response that overwrites whatever happened in the conversation previously. If you're talking about a tree-like struct where you can have different branches, then true, only the official UI has it AFAIK :)

Tiberium 3 years ago

Looks cool! Are you planning on adding more customization to be able to influence the AI? See https://bettergpt.chat/ (it's also open source and uses API in the browser). Basically with that frontend you can control the role of all messages (e.g. add system messages) and also edit them all to better influence the AI in some cases.

  • kami8845OP 3 years ago

    Editing the prompts (which are currently submitted via the system message similar to your linked app) is a great idea. I'll add it to the to-do list :)

computershit 3 years ago

BRO. Your transcription is SO fast. I've hacked at a similar project passing to the Whisper API and honestly I was already blown away with its speed and accuracy (as was anyone I showed it to), but your implementation is so much faster both in TTS as well as the response from their API. I will absolutely use this.

ilovepuppies 3 years ago

Very cool. I use a custom local UI as well, based on a fork of a similar project called ChatPad (https://github.com/deiucanta/chatpad). That also uses Mantine UI, and lets you create and save prompts just like chats. Data is stored locally using indexdb. I embedded it in an electron app, which lets me run it from my dock rather than a terminal. But what's missing is speech-to-text, so it's great to see this project has that.

There are a few drawbacks to local, I've discovered. For example I doubt the new plugins can be extended to beyond ChatGPT's web UI. Also, it doesn't stream response tokens as they're generated, which is a pain. I haven't looked into whether OpenAPI let you do that though.

Nice work!

ezzato 3 years ago

Looks great. Super interesting to browse other peoples code. I'm working on a desktop app for ChatGPT.

https://github.com/EzzatOmar/delegate

throwaway675309 3 years ago

Given that Vocode (realtime audio, llm, etc) came out a few days ago, could you speak to how yours compares to it?

https://github.com/vocodedev/vocode-python

  • MikePlacid 3 years ago

    So, is time, finally, to entertain spam callers with nice, polite, _long_ conversations? About my credit card numbers and passwords to my accounts? My personal record is 40 minutes - some nice guys were trying to install a remote controlled door on my MacBook and were thinking they were very close to success. There are existing services, like https://jollyrogertelephone.com/ - but they are not as good as me. Still, using myself to entertain the robocallers is fun, but expensive, it would be interesting to see if AI is ready to help here…

user- 3 years ago

Cool! I tried out the speech to text and it was instant and accurate, i had no idea whisper was that good.

Do you know their privacy for our voices? Do they train on it, hear it, etc ?

  • dilek 3 years ago

    if you're running it locally, they don't and cannot.

    if you're using the hosted whisper, they can. however, they don't specifically talk about it.

Karunamon 3 years ago

I absolutely love this! The UI is nice and responsive and this is the first chatGPT UI that has voice recognition that works outside of chrome!

I kind of want to throw this up on a server for my housemates to use, I am currently the only person with a openai account, so I would like the ability to embed my API key. Minor feature request :-)

einpoklum 3 years ago

Hi ChatGPT! Let me register using my personal information, then tell you what my tasks are at works, what I'm interesting in, what I'm struggling with in life and a bunch of other sensitive personal information. I trust you completely, and am sure a nice AI such as yourself would never use my personal data for anything.

illuminated 3 years ago

The only thing I'd suggest to consider to add is some sort of authentication. If I deploy this on a server so I could reach it with my mobile, on the go, and it has my API credentials, I wouldn't want anyone who stumbles upon the page to be able to interface ChatGPT on my expense.

Otherwise, it really looks good.

fudged71 3 years ago

I've been playing around with your Idea Generator persona for the last 15 minutes and have been absolutely blown away. Excellent prompt engineering.

As mentioned by others, it would be great to customize or write new personas/prompts.

Also could you add a voice chatbot as well using vocode? It could be an alternative UI for each of the personas.

diversionfactor 3 years ago

So if you add audio output to it so I can talk to my computer like in Star Trek, I'll venmo you $100. Then, I want to have a command line module so I can ask it to write files to the local disk and run them, so I can deploy code it's just written to AWS, that's worth at least another $100.

  • yosito 3 years ago

    It's not that hard to do, but I think this is lowballing. If you want a talented programmer to do something for you, you should be willing to pay them $150/hr. And I'm assuming this is more than an hour of work.

dingclancy 3 years ago

It would be great if I can just enter "space" in the app and it just lets me talk to it. Keyboard shortcuts!

BTW I have a lot of these ChatGPT UI apps installed, mostly free and open-source. Perhaps this is really the era of going back to just talking to a chat interface like the old times.

chenxi9649 3 years ago

This is very well made and designed. I will likely use this instead of the actual Chatgpt UI since their API is a lot cheaper than the 20$/month pricing for my usage amount.

Interesting note: I tried speaking mandrain chinese to the mic and it auto translated what I said into English.

donpark 3 years ago

Just tried this in both English and Korean. Fumbled a bit with voice control but worked well once I got it going. Very nice. Korean prompts got translated to English so had to tell ChatGPT to respond in Korean to get full non-English UX.

Well done.

  • tough 3 years ago

    It sounds like a nice modifier to add a one liner to the prompt "Return your response in $user.language"

oriettaxx 3 years ago

It's pretty bad to ask people to enter e private secret key in a web site (any, I mean)

  • thangngoc89 3 years ago

    They provided an option to build it locally and run it yourself. But yeah, I wish there is a common proxy protocol that would allow website accessing private resources without exposing private keys

    • titaniczero 3 years ago

      OpenAI should implement an oauth authorization server and allow developers to use "Login with OpenAI account" into their apps.

      • butterfly771 3 years ago

        I agree, this is the best solution, I'm sick of countless projects with key input fields where I have to go to Ctrl/CV every time.

        • blazespin 3 years ago

          Not to mention the ludicrously gaping security issue that this is. My guess is they want to push people to the plugins tho.

    • andag 3 years ago

      Maybe a small video demo would be an ok alternative?

  • Veen 3 years ago

    What alternative would you suggest for a free service that depends on OpenAI APIs? It's easy enough to generate an API key for this service and delete it afterwards.

  • gkbrk 3 years ago

    Why? OpenAI keys can be revoked at any time, and OpenAI allows you to set soft and hard limits for billing as well.

    You can also generate multiple keys, so if one app misbehaves, you don't need to rotate all the keys, just the one that misbehaves.

    This is assuming the API keys can only do generation. If it can access billing details or something it's very different of course.

    • balls187 3 years ago

      > Why?

      Because it's bad practice to provide sensitive information to untrusted sources, and if you are an ethical developer, it's an anti-pattern to write software that encourages bad practices.

      Your credit card company will reverse any authorized charges. Will you email me all your credit card info?

  • oriettaxx 3 years ago

    > It's pretty bad to ask people to enter e private secret key in a web site (any, I mean)

    I answer back to myself: I miss-understood since the idea of the developer is to run it locally http://localhost:3000 while I got scared from the DEMO

    Congrats to the developer!

terran57 3 years ago

I installed it locally about an hour ago and have been running it through some paces. Nice work! (In addition to the predefined prompts, I like the API usage meter at the top).

(now, I just need Openai to take me off the waitlist for GPT-4)

psychoslave 3 years ago

I’m a bit confused, I tried to utter some queries in Esperanto and French and it transcribed English (fine) translations. Can I disable this behavior to have the text transcribed in the language uttered?

andymac4182 3 years ago

I might be missing it but do we have an idea about the prompt that ChatGPT uses so we can replicate the experience?

I haven't played with the OpenAI API yet. Is there examples of good prompts to use to get good responses?

noobcoder 3 years ago

Love this, Few things we could add: - Search Feature - Way to import/export chats - Star/Favourite replies by ChatGPT - For GPT4 provide 8k/32k model variations - Prompt Dictionary

victorantos 3 years ago

I get a 404 error in the browser console for http://localhost:3000/encoderWorker.umd.js

afro88 3 years ago

This is exactly what I need, thank you for building this! We're using Azure cognitive services for API access to OpenAI models though. With any luck, expect a PR today for basic Azure support :)

LanternLight83 3 years ago

Could I hook this up to one of text-generation-webui's API formats?

aryamaan 3 years ago

would be so fun if you could fork a project on vercel i.e this project has a button to fork: - which forks its github - makes a new project on your vercel cause it's connected to your github - it opens a new tab with your project running.

kulikalov 3 years ago

Isn’t GPT a trademark owned by OpenAI? Is it legal to use it?

  • sebzim4500 3 years ago

    Looks like they've recently applied for the trademark but they haven't got it yet. I have no idea if they will get it or not, it is just an acronym but they did come up with it.

    • syrrim 3 years ago

      They did seemingly position it as a generic name for this style of AI model, and other people have been using it in that fashion (eg "gpt-j"). It's usually recommended to contrast a brand name for your product with its generic name, so that the two don't become confused. Hence why scrabble is always subtitled "crossword game".

      • matemp 3 years ago

        Agreed. I doubt that OpenAI's recent application seeking to trademark "GPT" will be approved. Maybe specific models/products, but not just "GPT" by itself...

        To be able to register a trademark in the U.S., the applicant has to show that the proposed trademark is in fact "distinctive" of their company. The more generic a term is in its field, whether to begin with (i.e., by not becoming distinctive in the first place...), or over time (i.e., by failing to maintain its distinctiveness), the less likely it is to be registerable. And, such "distinctiveness" is notably harder to achieve and/or maintain for terms that are more generic/descriptive rather than truly unique…

        In the case of "GPT," in the context of software (specifically A.I.), those letters -- particularly in that combination -- are understood to stand for things that refer to a kind of A.I. language model having certain characteristics, even though OpenAI was first to produce a (g)enerative (p)retrained (t)ransformer and they're still the most notable provider of such technologies.

ushakov 3 years ago

What's the use-case for this instead of the default UI?

itsthecourier 3 years ago

Cross-platform compressed audio record!? How!?

yosito 3 years ago

> Run locally on browser – no need to install any applications

This seems to be a contradiction. Am I running it locally, or is it running on someone else's server?

Obertr 3 years ago

speech to text didnt transcribe text after a minute. recording was 5s long(((

thelittleone 3 years ago

All your prompts are belong to us

kristopolous 3 years ago

Make it easier to try

  • kami8845OP 3 years ago

    Hey! I would love to. I seriously considered adding my own key into the app, and implementing some rate limiting to e.g. allow you to send 3 messages for free. But unfortunately that would require me to store some backend data on you that I do not want: I want this to be a completely "private" / FE-only application that stores no data on anyone.

    • connorgutman 3 years ago

      Testing YakGPT right now, excellent work! I would recommend adding some screenshots to the GitHub README so that people can get an idea of how it looks before entering their API key.

  • avindroth 3 years ago

    Before you comment something like this, ask yourself "How would I make this easier to try?" The only reasonable answer is providing the OP's own API key, which is undesirable.

    • kristopolous 3 years ago

      A video demonstration, cleaner example of what it is, etc... You can experience it by observation

jerrygoyal 3 years ago

could you please add some screenshots of how it looks

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection