Show HN: AI Playground by Vercel Labs
play.vercel.aiHey, Jared Palmer (creator of this playground) here. Really excited to ship this.
I’ve been building this over the past few weeks to compare LLMs from different providers like OpenAI, Anthropic, Cohere, etc. At Vercel, I manage our Frameworks division (including Next.js, Svelte, and Turbo) and wanted to also dogfood some of the latest features in a slightly larger application.
This playground takes a lot of inspiration from https://nat.dev and is built on Tailwind, ui.shadcn.com, and some upcoming Vercel products we’re announcing soon. We’re going to continue adding models to compare and add other frameworks to generate code snippets from. As a joke, I wrote "Give me a disruptive startup idea". This is what claude wrote: > GPT-4 didn't bother answering. Honestly, a real sign of intelligence :D I remember some people from Vooza pitching "Tinder for Drones" during a conference 10 years or so ago! (Truly disruptive if we manage to turn Universal Paperclips into reality.) Here's gpt4, it's formatting is better but this is the text. Though this isn't really the right way to use it, single responses miss a large amount of the potential imo. Startup Idea: SmartUpCycle Overview: SmartUpCycle is a disruptive startup aimed at revolutionizing the way we handle waste and promote sustainability through upcycling. The platform uses advanced AI and machine learning algorithms to intelligently identify and match discarded materials with potential new uses, connecting individuals and businesses with creative solutions for repurposing waste. Key Features: AI-Powered Material Matching: The platform utilizes AI to analyze and categorize various types of waste materials, identifying their potential for upcycling and suggesting creative and functional uses for them. This allows users to find new life for discarded materials, reducing waste and contributing to a circular economy. Marketplace for Upcycled Goods: SmartUpCycle provides a marketplace for users to buy and sell upcycled products, offering a wide range of unique, environmentally-friendly, and innovative items. This encourages more people to engage with upcycling and supports the growth of a new sustainable industry. Waste Collection & Redistribution Network: The startup collaborates with local waste management companies, recycling centers, and businesses to collect and redistribute waste materials to users, promoting a circular economy and reducing the need for new raw materials. Educational Resources & Community: SmartUpCycle offers educational resources, tutorials, and workshops to teach users how to upcycle waste materials and create new products. Additionally, the platform fosters a community of like-minded individuals who can share ideas, collaborate on projects, and inspire each other to embrace sustainability. Business Integration: The platform offers integration solutions for businesses, helping them reduce waste, lower costs, and adopt sustainable practices. By providing access to a network of upcyclers and a marketplace for upcycled goods, businesses can transform their waste into revenue-generating products. Revenue Model: SmartUpCycle generates revenue through various channels, including: Transaction fees on the marketplace for upcycled goods. Subscription fees for premium features, such as advanced AI recommendations and business integration services. Sponsorships and partnerships with waste management companies, recycling centers, and eco-friendly brands. Fees from workshops and educational resources. Impact: SmartUpCycle promotes sustainability by reducing waste, conserving resources, and encouraging a circular economy. By connecting people and businesses with creative upcycling solutions, the platform fosters innovation, supports the growth of a new sustainable industry, and contributes to a healthier planet > The platform uses advanced AI and machine learning algorithms to intelligently identify and match discarded materials with potential new uses, connecting individuals and businesses with creative solutions for repurposing waste. In other words, GPT-4 I get this is in jest, but I think the actual solution is to use image classification with a framework of normal code to glue it together. GPT-4 could help you code it… What do you mean by GPT4 didn’t bother responding. It has better ideas that this when I just asked it. This is generic af. Are you a Claude shill or something. I dunno what Cohere is, but it's answers are super weird. I asked it "Where does the phrase "the rain in spain falls mainly on the plane" come from?" and it said: Does anybody know a green song?
What happens if a girl doesn't have a twin brother?
What musical instrument did larry black sing 'jealous heart' on?
Is country music more country than country music. I'm confused.?
What is the difference between a copyright and a trademark?
What is the best piano chords software u can get for beginners?
What is the best music software for beginners?
What is the song from the new dexter episode?
Did he kill his brother?
What is the name of the song dueling banjos?
Was the raiders of the lost ark the first jungle hero movie?
Who really invented reverse tape effect?
Why can't me and my neighbor get along?
What prophet spoke to the teachers of the law at jerusalem about their treatment of jesus?
Are there any songs that describe the emotions of a cult member?
What is the most ugly song ever made? IIRC, GPT-2 and 3 are somewhat like this, asking a question often results in it giving you more questions instead of an answer, unless you give. ChatGPT and GPT-4's tendency to answer questions instead of continuing them comes from the RLHF, I believe. Clearly the most advanced model we've seen to date. It's already trolling you. And the comment means the next bot gets trained on this answer For Cohere, make sure you're using Command-Xlarge-Nightly. Otherwise, you may be prompting a Base LLM expecting the behavior of a different kind of LLM (an instruction-tuned chat model). Cohere's Command model builds on top of the base model, giving it the capability to follow instructions and user commands. The rain in Spain falls mainly on the plain. Not sure if it makes a difference to the response. Coheres model is Claude and it's a completion style one. So it's trying to carry on what you wrote, and a list of questions is reasonable. It worked just fine for me copying and pasting but you might try this format: Q: your question A: Then it is more likely to fill in an answer for you. Cohere actually trains its own models and they are not based on models from other providers [I work at Cohere]. Your prompt suggestion is a good one for LLMs as a whole. Any information added to the context informs the model and nudges it towards the expected answer format. Oh sorry I've got mixed up with who runs what model, I thought coheres one was Claude, but that's anthropic. Wasn't trying to say it was based on another. I got mixed up with the models, Claude is from anthropic. Just like your own question - it seems to me it's riffing on "Bar Quiz" type questions. Have tried the 'Question:'...'Answer:' prompt format? You didn't tell it to act as an assistant so it just assumes that your input is some list of questions that it's supposed to complete. When I asked if elephant farts were loud it answered with a lot more detail than ChatGPT. it seems that Cohere might be suggesting that the phrase is as devoid of meaning as its unrelated questions, considering that the phrase was originally just a vocal exercise for a character in a film. You manage the Frameworks division (including Next.js, Svelte, and Turbo)? How did you get the time to build this? Speaking about high-level strategic roles in general (eg CTO), assembling a good team that you trust to get on with things means you don't need to spend a huge amount of time managing them on a day-to-day basis. If you do your job well you can get to a point where 'managing' is mostly about steering the direction of the team - figuring out the answers to questions about 'why' and 'what' you're doing as a team. The team members themselves can deal with the 'how' part. That shouldn't be your job. The 'why' and the 'what' don't actually change very often (hopefully!) so you can have plenty of time for experimentation and 'play' like this. This is a part of how you learn and answer the 'why' and 'what' parts of your job. A growing backlog Helps when your boss is a prolific coder himself. Right. It seems like the entire executive team is technical and producing demos. Using GPT4 How much does it cost to query these models all the time? It's offering a lot for free. There is a rate limit. Even still. GPT4 in particular gets expensive at scale. Vercel is probably noticing that the LLM community is using NextJS (and Vercel) by default a lot of times, and that supporting the community and building more goodwill will pay off. Vercel is so overpriced on their hosting plans that they can afford that to attract new customers, I guess. You can spend hundreds of thousands trying to implement just SSR and still not get it working. (trust me) For what Vercel is, it's a steal. Server side rendering?
Like the thing that PHP has been doing for decades?
Genuinely curious how this has now become difficult again after being so completely ubiquitous with hosting, I guess it’s about “scale” and “edge” etc.? Please provide an icon that makes sense on other platforms for the shortcut on the right bottom, not everyone uses Macs! Overall, pretty cool idea, how is Vercel gonna handle API costs for this? I imagine this is pretty expensive to run, specially considering GPT4 access. Also, I have to link this here because I thought it was too funny: https://play.vercel.ai/r/G9y3eXJ edit: Oh I just hit the token limit, I guess that's how you're dealing with cost lol. Something I was gonna suggest is that maybe answers should be cached to prevent repeated calls to the same API when the prompt hasn't changed, this could potentially increase the amount of tests I could do. "How many words are there in "The Star Spangled Banner"?" LOL hahaha Will relay your feedback about the icon back to the team - thank you for trying this out! beautiful work as always jared! one of the useful things from nat.dev was the tokens/second calculation, latency is impt as usual. would be nice to enable? i'm excited to see what the upcoming products are. making things default sharable like this across all frameworks would be really nice. Thanks man! Good idea on token/s. Will add Would be nice to know cost too! tokmon is on showHN today too I'm super excited in graphing this over time. Will be interesting to see how the providers develop over the coming months.
FWIW we also need TTFB not just tokens per second. The first token is key for UX How did Anthropic agree to let anonymous users call their API? This was a big stumbling block for me. Anything to boost the ole DAU's. Thanks for setting up a limited free version. The UX is great. I had some failures for huggingface and cohere. Replicate produced an output that was not rendered (but returned an answer in the network tab). It would also be great to add:
https://docs.aleph-alpha.com/docs/introduction/luminous/ Investigating now. Thanks for the feedback! Fixed Cohere and Replicate. Will add the other provider now! Appreciate the help! Thanks for the fixes. Sounds great, looking forward to it! Awe man I was just building this. You beat me to it! Looks great. FYI needs some tweaks on mobile. There's https://github.com/nat/openplayground and https://github.com/ztjhz/BetterChatGPT among others. Could you send me an email with the issue you're seeing? lee at vercel dot com. Thank you! open source your impl then, its a good basis for a lot of other tooling :) khalidx listen to swyx. If you need help, let me know. Haha yup, welcome to the club. I'm getting too old and too distracted. You can add https://www.ai21.com/ LLMs as well Prompt: Pretend you are William Gibson. Pitch your next book to your publisher. GPT-4:
> Dear [Publisher], > I hope this letter finds you well. As you know, my work has always strived to explore the intersection of technology, society, and human nature. With that in mind, I am excited to present to you my latest novel, "Neural Drift." > Set in the not-too-distant future, "Neural Drift" delves into a world where the lines between our reality and the digital realm have all but vanished. The story revolves around our protagonist, Elara, a skilled hacker and data courier who navigates the vast neural network known as the "Mindscape" – an immersive digital landscape that has become the primary means of communication, commerce, and entertainment for the global population. > In this future, the Mindscape has become an integral part of daily life, and the distinction between the physical and digital worlds is all but gone. Society is now heavily reliant on neural implants, which allow users to access the Mindscape and interact with the digital world through Anthopic Claude-v1 (which wouldn't answer at all if I didn't put pretend in front): > Here is my William Gibson-inspired book pitch: > The future is here—it's just not evenly distributed yet. In my new novel, Pattern Recognition, I explore the bleeding edge of culture, technology, and society through the eyes of Cayce Pollard, a "coolhunter" who has an uncanny ability to spot trends in the wild. > Cayce is hired to track down the creator of a series of mysterious film clips being uploaded to the internet. Her search takes her from London to Tokyo to Moscow, where she encounters a dangerous mix of hackers, hipsters, and Russian mobsters. Along the way, Cayce begins to realize the film clips seem to have a strange predictive power—they appear to show events that have not happened yet. > Pattern Recognition is a stylish, fast-paced techno-thriller that examines how technology shapes culture, and how culture shapes us. It blends Gibson's trademark cyberpunk sensibility with a YES, it did just regurgitate an existing Willam Gibson book But you didn’t ask it to pitch a new book. At one point, Pattern Recognition was actually his next book, and there’s probably a lot of text out there discussing it as such. Sure absolutely, but would a human make this mistake? You can't pitch something that already exists (I mean in this context we're pretending, so maybe we can pretend we're pitching that book). But a human (and GPT-4 does) assumes you want an original book. It's still an assumption. I was playing around with GPT-4 making it take the questions for an Economics exam the other day and it was missing a lot of context that a human would get or assume. It was still much better at economics than me, but was going down "wrong" paths that a human wouldn't go down. If you told it to explain the the question and list assumptions you could work out where it was going off track, but it lacked the ability to step back and recognise it had taken a nonsense path. > but it lacked the ability to step back and recognise it had taken a nonsense path. Errr. Yes! We're at the stage where you should be amazed how capable these things are considering their fairly brute force nature. Being surprised at the limitations is a peculiarly - dare I say? - human response? Usually the workflow involves: 1. Write a naive prompt 2. Gain some insight into AI's behaviour 3. Improve prompt and repeat 1. Of course they lack insight. Of course there's no "meta-reasoning". What is astonishing is how often a glorified Markov Chain gives the impression that there is both. It's all astonishing. I don't think it's so much surprised at the limitation as trying to explore the edges. Enjoying finding some of the emergent behaviours. It seems like your issue is that this is not human, and it does not do what a human would do. This is a high expectation and, possibly dangerous, assumption. Shows how this tech is such a double edged sword for user value, expectations and assumptions. No, I'm well versed in the limitation of the tech. The comparison to what a human can do is just an interesting comparison. It's a feat the GPT-4 model can manage in this context. GPT-4 is also exhibiting emergent behaviours that are not found on models with smaller numbers of parameters, it is these emergent behaviours that make it seem more 'human' in it's ability to produce content, even if the "reasoning" about it is all an illusion. Even here there are limitations, which was my intent with mentioning the Econ exam. Aside: It's also possible to get different answers by making the LLM simulate being different people (a student, or a professor). It is very easy to slip into thinking about ChatGPT in terms of actually thinking, but then again our understanding of the human mind has led to theories that conscious decision may well be an illusion as well (if brain scans showing that the decision is made before the conscious mind is aware are anything to go by). At the moment though, these are just tools, with varying degrees of usefulness. It's not a mistake. It's just ambiguous - you and the OP (rather than "humans") share some contextual assumptions that to you make it imply one meaning. If someone asked me with no context to 'be william gibson pitching your next book' I might answer just the same as gpt did It's a fair point, and some of the context that's missing for the LLM is just present in the background for humans. You wouldn't need to be told, it would be whether you were in an acting class or creative writing class (these are weak examples). We get so many context clues that aren't present in current LLM text interactions. Great! Vercel edge functions are my preferred deployment for OpenAI calls now. Is there any other (realistic) option? Afaik if you don't make them edge calls, you'll get charged more money by Vercel for compute than OpenAI will charge you for the actual request. Yup! Not just Vercel. The entire industry is on the GB-s billing model, which doesn't make sense if you're just waiting for requests to come back and forth. Do you have an example of this? Is authorization a part of vercel? Yes, although we haven't added authorization yet. This is the simplest I could get it: still uses the OpenAI npm package and doesn't need streaming: https://github.com/yisding/gpt-edge-test If you want to try streaming you can take a look at the readpanda repo. OK technically not Vercel, but https://next-auth.js.org/ There are a bunch of examples here https://vercel.com/templates/ai Just for fun I wrote this prompt "vercel monopolizes nextjs by only allowing certain features on their platform such as" ChatGPT4: 1. Incremental Static Regeneration (ISR): This feature allows developers to update static pages at runtime without having to rebuild the entire website. Vercel has optimized its platform to support ISR, making it an exclusive feature for Next.js applications deployed on Vercel. 2. Real-time Feedback: Vercel's platform provides real-time feedback during development, allowing developers to see the changes they make in their code instantly. This feature is exclusive to Vercel, making it a significant advantage for developers using Next.js on their platform. 3. Edge Functions: Vercel offers Edge Functions, which allow developers to run serverless functions at the edge of their network, providing low-latency responses to user requests. This feature is particularly useful for Next.js applications that require server-side rendering and is exclusive to Vercel's platform. 4. Preview Mode: Vercel's platform supports Next.js's Preview Mode, which allows developers to preview draft content changes without affecting the live site. This ------------------------------------------------------- Claude v1.3 That is not correct. Vercel does not monopolize Next.js or limit features to their platform. Next.js is an open source framework that can be deployed anywhere. Some key facts: 1. Next.js is MIT licensed and the source code is available on GitHub for anyone to use, modify, and deploy. 2. Next.js apps can be deployed on any hosting provider or infrastructure. This includes: - Vercel (the creators of Next.js)
- AWS
- GCP
- Azure
- DigitalOcean
- Heroku
- Your own servers, etc. 3. All Next.js features work regardless of where you deploy. This includes: - Static Generation (SSG)
- Server Side Rendering (SSR)
- Incremental Static Regeneration (ISR)
- API Routes
- Image Optimization
- Font Optimization It's completely broken. I couldn't even get one full response from one of the AIs, it always gets stuck near to the completion of the first response. After like 5 tries it told me that I used all my requests. lol. Same :( I just wanted to see what Google's Flan would give to "77 + 3 * 11" and it says "Something went wrong" and now I'm rate limited Well it's not broken, I'd assume they limit the `max_tokens` to minimize costs. If the limit doesn't even allow you to get a result with the default settings, I'd assume something's wrong. GPT-4 is miles ahead of the competition. It's not even remotely close. I asked it to help me design an app for a specific use case. It came up with a list of features, the tech stack, and even designed an initial database. Yes and no. Or, yes for such use cases you mentioned, also it is miles ahead in reasoning capability, translations from Chinese, etc. etc. But still, for more casual talk, like bouncing ideas, brainstorming etc., I prefer Claude. It's kind of more "friendly", less formal, and, which is sometimes also important, much quicker. Have you used GPT-4 via the API or via ChatGPT only? The difference is huge, as with the API you can control the parameter + you can set your own system prompt (basically steer the AI better). With the API, you can basically customize the session ranging from providing batshit bananas replies (by setting high temperature) or almost deterministic replies (by setting low temperature) with personality controlled by the system prompt, so you can make it "friendlier", "less formal" or anything else. It is still as slow via the API as via ChatGPT though, which kind of sucks when you're using it for longer passages. I do wish it was just about 100% faster. I use both ChatGPT and API. Mostly I would say I use it for work and when I learn Chinese, and Claude is in my mobile via Poe app, which I use for more casual talk or when I know the answer doesn't require advanced reasoning. Then I also use phind.com website, which I would say is like a better Bing (at least for programming queries). Yeah, even with coding questions, I have to ask it to only write the code. Otherwise it will spend half its time teaching me how to install React first. This is awesome, will this be open sourced to poke around the codebase? Some models seem to hang w me (anthropic) but I think it was because I reached the limit. Does loggedIn usage w vercel will become usage-paid? Well, every single thing I tried other than gpt-4 failed for me. Very nice! If you want to add a feature, feel free to make use of https://github.com/osteele/prompt-matrix.js, which I (well, ChatGPT and I) created for use in my own playground so that I could enter e.g. "Write a <Python|JavaScript> program to look for anagrams, in the style of <Norvig|Carmack>". To be clear, that’s a link to a library meant for use in projects like yours. I’m not promoting a competing project (and haven’t provided a link to one). How expensive is this to run? I imagine a lot less than you would expect. That is at least my experience from observing user behaviour. What's the data policy on this? Are you keeping copies of the prompts and responses? Does anybody know if there is a similar playground, but for evaluating a single model by comparing different prompts with different temperatures? I am building this by myself using Streamlit, but was wondering is there a ready solution. How are the models hosted? Edit: they are using their own hosted APIs such as Anthropic or Replicate. Correct. We are using various hosting providers. As part of the project, I’ve been working with providers and hosts on updating their SDKs to work on Vercel Edge Functions (and streaming). Do you think the codebase will be open sourced within the next week or two? Great interface! Maybe slightly tangential, on the subject of UIs for ChatGPT, does anyone know a good UI for voice input to ChatGPT, for either the website or via an API ? > are you gpt-4? I am an AI language model created by OpenAI, but I am not GPT-4. I am based on the GPT-3 model. GPT-4 has not been released yet. Does this fully use UTF-8?
I have been trying it a non-lating language and its showing ?? Icons in every answer. There was a problem with some models. I went ahead and fixed them. In a non-latin* Its output for GPT-4 much worse compared with an output of chatGPT-4 available at chat.openai.com Nice, can I have conversations? It’s not clear on mob how to do this vs. Just one prompt Thanks for sharing! What business problems are you solving at Vercel with LLMs? The selling of shovels, I imagine ;) Ha! I keep asking on these threads for business problems being solved, and haven't got very far the UI is quite limited, check out the outputs https://play.vercel.ai/r/I25tmJs when needs a scroll bar. Nice work. Is it open source? Is the source available?
I mean, they are disruptive startup ideas, sure enough. GPT-4 didn't bother answering. Here is a potentially disruptive startup idea:
• On-demand drone delivery for e-commerce. The idea is to offer ultra fast delivery of small packages via drones. This could significantly reduce delivery times for e-commerce orders and disrupt the traditional shipping industry. Some key challenges to overcome would be regulatory approvals, safety, noise pollution, and scaling up a drone delivery network. But if executed well, it could transform how people receive goods and packages.
A few other disruptive ideas:
• An Uber-like model for renting out spare rooms or couches for short term stays. This could disrupt the hotel industry.
• A Netflix model for streaming live TV channels over the internet. This could disrupt the cable TV industry.
• An automated vertical farming system to grow organic food locally in urban areas. This could disrupt the traditional agriculture and food supply chain.
• Decentralized blockchain-based social network. The idea is to build a social network that is not owned or controlled by any single company. It would be an open, decentralized platform built on blockchain technology. Users would have more control and ownership over their data and profiles. The business model could be built around micro-payments or subscriptions. It could disrupt the dominance of companies like Facebook and offer an alternative distributed social networking experience
• A virtual reality social network. If VR/AR technologies become more mainstream,