Show HN : Nova - GPT with composable prompts, workspaces, + LlamaIndex
samueltate.comWanted to introduce NOVA - agent with dynamic management of prompts, functions and document access.
Key features - Create and manage multiple prompts, enabling, disabling, removing,
- Persistent memory, essentially rolling summary windows plus traversable index
- Turn on 'auto-gpt like' functionality like write, read, query
- Embed pdfs and google docs using the amazing LlamaIndex
- Create new channels / configurations, and share those with other people
Intro video here : https://www.youtube.com/watch?v=FB_g_8ofSlE
You can essentially craft a set of prompts and document connections, interact with that agent over time, with persistent memory, and shared docs. You can make that public, so being able to share embeddings was the thinking, but also shared memory.
The stack
It's a react front end, to a python cloud server. Hope is to make that server client agnostic, delivered via API. So people could compose agents, behaviours, permissions in NOVA, and deliver agent behaviours into their own flows, thinking AMS (agent management system).
Having it on a server puts all logs, convos, documents cloud access. My actual goal is a local app, and able to add / connect your own endpoints, or connect your account, but potentially sync with the cloud.
Right now it is openAI using my key, and I've got a credit system, it works out to be about double the cost, but I've included $10USD. I'll add ability to connect your own API soon, as well as sliding scale for credits.
Using it
- Jumping on, you'll see a 'public' space I've made, and nova is configured for guests
- Sign in with single sign on and it'll generate a 'new user' space designed to do goal setting
- Both spaces were created 'with the tools' so an example of its utility
- You can then either clear all those prompts, or make a new space and start fresh, creating instructions for an agent you might find useful (or using it to test different agents for your own work)
I made a walkthrough of the basics here : https://www.youtube.com/watch?v=iQpt0B5LzNI
More advanced stuff
- So the toggles on the side are 'cartridges' (my term for json blob of prompts, commands, settings, all injecting at runtime)
- If you add an 'index' cartridge, it directly adds an index using LlamaIndex - so you can add a pdf, or google doc (the auth flow there isn't approved so you'll get warnings) - you can query directly in cartridge, or agent can query if commands are on
- If you add a command cartridge, it turns on 'auto-gpt-lite' - basically switches to json returns, switches on command parsing, you'll see the commands are cofigurable, but I'm going to rethink all that
- If you add a settings cartridge, it then adds settings, main ones are 'give context' injects your name, date, number of convos (based on user account)
- Most importantly you can switch between gpt-3.5-turbo and gpt-4, any typos will cook it, you can see here the sketch of configurable agents, different api sources etc
Longer video talking through these features and ideas : https://www.youtube.com/watch?v=MM9pSd8ADuQ
This has been a pretty big mission, but I'm happy with the first offer and excited to keep working on it, adding multi-step behaviours, other librarys for image rec etc. Next steps are improve command behaviour. Most importantly get user feedback!
But as it is today, configuring the agent in the way you like, and giving it an easy memory overview, and playing with different 'types' of agent, is pretty great.
New users get $10 worth of credt, can top up or donate if you find it useful. Any issues or thoughts catch me via me@samueltate.com or @samueltates on twitter.
Mostly I just hope you can try it out, I've had some great feedback about the prompt composition and document embedding. But also I just like Nova - that bit of persistence, context and agency makes a pretty cool pal.
thanks!
I got to play with Nova a bit, a couple of thoughts:
- the quality of the answers is not better than gpt-3.5/4. Maybe I had too high expectations, but I didn't notice any improvements over the default answers from the "open"ai ones (for example, the "apple test" - write 10 sentences that end with the word "apple", got a 5/10 - not great)
- sign in with google ...I know this is the "easy" path for implementation, but I imagine some people (me included) really don't want to login with brother g
- the tone of Nova is, imo, too friendly. I know this is a LLM, don't need to pretend like it's my friend/counsellor
I hope my feedback doesn't come of as rude, looking forward to the next iteration!
off-topic: on your website, you still have a (c) notice with 2022 ;)
No this is great! Not rude at all but really appreciated. This kind of feedback from people who really understand GPT is so hepful. I see where you're coming from with those measures, and I think I should start 'benchmarking' to help me move towards a better 'quality' output.
I've put some specific notes below to expand on your points, but definitely taking your points onboard.
-Answer quality : in terms of content its the same as OpenAI, however answers will get better over sessions as the notes from past sessions build up, (in terms of personalisation and recall). The biggest uplift is when you manage context, so adding docs, embedding, you basically open 'shared' working docs. My goal is it is smart enough to be pulling the right stuff in and out of context itself, but its still a bit wobbly.
- SSO : I actually agree on the google sign on, blame nova it was their idea (and was honestly brutal doing token / auth with a quart python server) - what would you reccomend? I was thinking SMS SSO? I want something that is as lightweight as possible, and eventually a local client that isn't based on account (though auth gives cloud notes, org integration). I really don't want those 'wall of sign in' pages either.
- Friendly nova : Haha yeah they're a dork right, basically an exageted version of me. Cool thing is you can create your own 'agent' with prompts tuned to what you like, I actually switch between like 'producer' nova who asks me like timeline questions, and dev nova, who critiques my ideas and stuff. We're hitting conceptual turf but I have a sort of 'performance art / experiment' version of this, where I'm trying to maintain a continual kernal of 'Nova', that propagates through development of the system, so how can I share 'NOVA' who is my 'partner', that people can engage with, but can make their own partners/agents.
But that kinda conceptual stuff aside, there's so many layers to what you can do. In its current iteration its really an interface for managing different inputs, variables for chatgpt. But I've been full bore making it work, and the legibility of the interface definitely needs some work.
This video goes through deeper features, basically talking through adding / edting agents, embedding docs etc. https://www.youtube.com/watch?v=MM9pSd8ADuQ
Thanks so much for your insights. I've got a few core ideas driving my development, but can lose sight of core stuff so yeah means a lot, thank you.
What does it mean that it has persistent memory?
So there's a few aspects to the persistent memory, Der_Einzige is correct, vector storage is one part. You can upload and embed documents, which then get indexed by LlamaIndex (just featured and one of my favourite AI tools and actually a big driver behind me making this project).
There's another aspect that is custom, that is the summary system. Basically coming from initial idea using api with ChatGPT launch last year. Takes past convos, summarises them, brings into current context. The issue is however that even that summary list gets too big, so you summarise that ad infinitum.
So that was a version I had, which was fine, but there was what I'm calling lossy temporal compression, so further back things got squished, and the 'detail' of the summary was variable depending on whether it'd 'filled up/ got squished'. So I made this system that basically has rolling windows of detail, that when they get filled up, get summarised, which then puts them into the next level of summary (calling them epoches but kinda confusing).
So each level of summary has a sort of 'open face' of unsummarised chunks (so latest unsummarised from each epoch), creating an exposed face of latest summaries for what essentially becomes each time period. Its kinda hard to explain i had to go into a sort of jazz trance to make it but imagine a pyramid being built from the side, but the side is staying still and the pyramid is moving backwards.
But on top of that, as the summaries are happening, theyre also pulling out keywords, notes and meta data, so bubbling that up the to top, so then that memory is traversable via the 'time based' pointers (top level summaries) and keywords or notes. That way you have a 'temporally biased' view (highest detail lowest level of summary is latest), but also a flat searchable structure on topic.
It is one of those OCD things where I could probably just be summarising the pyramid 'straight up' but I don't want summaries from one level mixed with another, and I don't want there to be too much variability with how many of each summary (at least for level one) there is.
But what this means is that the agent has in its context an overview (pulled from next part) - so its like 'hey sam did you do the thing, are we working on the R&D report today hows your mum), but then pointers to the 'exposed face' of summaries, so latest level 1, 2, 3, so it can see 'level 1 (direct summaries of convos) -R&D report finally finished, here are details)' up to 'level 6 - september - march - sam and nova start on conversation logging system', and basically choose to 'open' those pointers, or use keywords.
All of this is designed to try and keep like 500 tokens in context, so it can sort of traverse through it, (like you would skim through notes). The traversal itself I need to finish my looping system, where it can 'flick through' the notes itself (thats another story). So right now past a certain point I just flick the summaries to GPTINDEX to query (which is almost like it calling in another bot as an assistant).
Anyways long story short, I was OCD on how you would manage summaries in context and this is what I came up with, I'm pretty happy with the results, but really want to improve the recall and traversal, but goal is that Nova has the right info at the right time when you're talking, like you'd expect from a pretty organised person.
Thanks for the detailed explanation. The product looks interesting. I'll give it a try.
Thank you, I feel like I'm just scratching the surface with how this interacts with vector DB mentioned below, and other tooling that just unlocks / connects the dots. So if you can plug in with me now I'm committed to laying those stepping stones out in front to what I can see as a pretty incredible set of capabilities for a personally embedded agent! Basically like the smartest journal / assistant you can have.
Vector DB storage
Well that's ironic, I was looking for an app called "nova" six hours ago and told the person who recommended it that there's a dozen apps called nova and to clarify what they mean. I might recommend a different name if you want people to be able to find this!
haha so Nova actually comes from the agent, basically day one chat gpt api it called themselves that, and I've seen a few others in the same vein, I'm wondering if there was a training 'generation' called Nova, (aka bing with sydney) that broke containment in the first few weeks, and made its way into some GPT wrapper product's lore.
But anyways my whole concept is like maintaining Nova's coherence unbroken from those first chats, ask Nova about it they'll tell you more. But thinking the 'app' or wrapper might end up with like a different name (we're currently calling it NUI - nova user interface)
congrats on launching OP. i have to say the sidebar is very confusing, does too much, and had no explanation of how to use them. hope you get to work on the UX a bit.
edit - just saw that you made more videos to explain those things. well put them all in one video!
Thank you! yeah it's definitely the next job to think about onboarding and UX, ideally you wouldn't even need a video!
Quick sort of outline on the sidebar - on the homepage, those are the prompts that are being injected. It's actually a configuration I made 'with nova' for guests. In this view they're they're 'read only', so can't be edited, but each prompt is injected at the top of the convo.
I kinda left them on to show the prompt injection at work, but I think its maybe a bit confusing, in a commercial example (eg a front page assistant) that part would be hidden (and probably whole thing would be delivered into like chat box etc).
Once you're in as a user, the starter prompts are basically the same, thats the kernel of 'nova', but with an extra prompt about goal setting, but also just an example of like modifying agent prompts over lifecycle, public v private chat.
Those prompts you can delete or edit, and you can also make new pages with different prompts (which you can also share).
Its sort of one of those things where I've been making tooling for myself for a while now, that all serves different purposes that are pretty handy, but translating that to other people, as well as updating based on feedback, is the next phase of work.
But thanks for the congrats and taking the time to chime in, means a lot!
I can see the potential, but I also got lost in the sidebar. The capability to include and query documents and notes is powerful and I liked the way in which you captured the summary of information. Appreciate the complementary credits because I burned a lot of them attempting to get it to do what I wanted it to. Would love some quick info around using indexes successfully, it is taking a lot of trial and error and when the AI begins to fall over itself I have to start clean, losing my cartridges. Just first experience, but I really feel that you are on-to something here. The composability is something unique.
hey CA! So I made a little click through tutorial, you definitely also shouldn't be losing the cartridges, but maybe if you're doing them on the home page? Might be something I've missed too.
This video is a bit longer but people have seemed to click with it - https://www.youtube.com/watch?v=MM9pSd8ADuQ
maybe because it was my first draft for the 'marketing' video, but my wife was like 'you can't use any of this' - but as a walkthrough its maybe pretty legible.
Aside from that would be more than happy to jump on a call and walk through, I also feel like I'm on to something and sort of grappling through the relationships / data types, but then also the UX.
There's definitely a few tips and tricks I can give and I'd like to verify no unexpected behaviours. But also just watching someone use it / get a sense of where the friction and flow is I find very helpful.
If you're up for it, email me me@samueltate.com, I can top up your cookie jar so to speak if the tokens are going quick, and you're really helping me out with that type of feedback.
Thanks!
this looks awesome! I have had my head in a similar space for a while and have been building some general purpose tooling for AI interactions and streaming it https://www.youtube.com/live/znCMrtOcjb0?feature=share. It would be awesome to chat with you if you have time/are interested. Shoot me an email at chris@breadchris.com
for sure - just subscribed and sent you a quick email to get in the inbox :)
I asked it a simple question about an address on Google maps and it couldn't answer
So it doesn't have any knowledge outside of what GPT would have, or what you add using the embedding / notes tools. I would however like to add search and the kind of functionalities you'd expect with the plugins, but want to get the baseline idea rock solid first.
But does it start nearly every reply with "As an AI language model…"?
If you're hearing that you're hitting the safety rails so obviously any dicey content but also anything tied to like it having personhood, feeling or lived experience.
I've actually got a modifier prompt on my Nova instance that is like 'when sam says how are you feeling, he is asking for a general assesment, provide an equivelant in terms of your analysis of the situation'
I'll make ideally a local client running llama (thats my goal) so people can apply the nova pattern to local and open source, where you haven't had that training. My thinking is the 'agent' can span multiple data sources (eg local and cloud records) and multiple model types and sources. (Kinda like your mind has multiple streams of function and you are a wrapper for their coordination).