Settings

Theme

Show HN: AI Playground by Vercel Labs

play.vercel.ai

249 points by palmdeezy 3 years ago · 102 comments · 1 min read

Reader

Hey, Jared Palmer (creator of this playground) here. Really excited to ship this. I’ve been building this over the past few weeks to compare LLMs from different providers like OpenAI, Anthropic, Cohere, etc. At Vercel, I manage our Frameworks division (including Next.js, Svelte, and Turbo) and wanted to also dogfood some of the latest features in a slightly larger application. This playground takes a lot of inspiration from https://nat.dev and is built on Tailwind, ui.shadcn.com, and some upcoming Vercel products we’re announcing soon. We’re going to continue adding models to compare and add other frameworks to generate code snippets from.

rendall 3 years ago

As a joke, I wrote "Give me a disruptive startup idea". This is what claude wrote:

  Here is a potentially disruptive startup idea:

  • On-demand drone delivery for e-commerce. The idea is to offer ultra fast delivery of small packages via drones. This could significantly reduce delivery times for e-commerce orders and disrupt the traditional shipping industry. Some key challenges to overcome would be regulatory approvals, safety, noise pollution, and scaling up a drone delivery network. But if executed well, it could transform how people receive goods and packages.

  A few other disruptive ideas:

  • An Uber-like model for renting out spare rooms or couches for short term stays. This could disrupt the hotel industry. 

  • A Netflix model for streaming live TV channels over the internet. This could disrupt the cable TV industry.
  
  • An automated vertical farming system to grow organic food locally in urban areas. This could disrupt the traditional agriculture and food supply chain.

  • Decentralized blockchain-based social network. The idea is to build a social network that is not owned or controlled by any single company. It would be an open, decentralized platform built on blockchain technology. Users would have more control and ownership over their data and profiles. The business model could be built around micro-payments or subscriptions. It could disrupt the dominance of companies like Facebook and offer an alternative distributed social networking experience

  • A virtual reality social network. If VR/AR technologies become more mainstream,
I mean, they are disruptive startup ideas, sure enough. GPT-4 didn't bother answering.
  • spaceman_2020 3 years ago

    > GPT-4 didn't bother answering.

    Honestly, a real sign of intelligence :D

  • rpastuszak 3 years ago

    I remember some people from Vooza pitching "Tinder for Drones" during a conference 10 years or so ago!

    (Truly disruptive if we manage to turn Universal Paperclips into reality.)

  • IanCal 3 years ago

    Here's gpt4, it's formatting is better but this is the text. Though this isn't really the right way to use it, single responses miss a large amount of the potential imo.

    Startup Idea: SmartUpCycle

    Overview:

    SmartUpCycle is a disruptive startup aimed at revolutionizing the way we handle waste and promote sustainability through upcycling. The platform uses advanced AI and machine learning algorithms to intelligently identify and match discarded materials with potential new uses, connecting individuals and businesses with creative solutions for repurposing waste.

    Key Features:

    AI-Powered Material Matching: The platform utilizes AI to analyze and categorize various types of waste materials, identifying their potential for upcycling and suggesting creative and functional uses for them. This allows users to find new life for discarded materials, reducing waste and contributing to a circular economy.

    Marketplace for Upcycled Goods:

    SmartUpCycle provides a marketplace for users to buy and sell upcycled products, offering a wide range of unique, environmentally-friendly, and innovative items. This encourages more people to engage with upcycling and supports the growth of a new sustainable industry.

    Waste Collection & Redistribution Network:

    The startup collaborates with local waste management companies, recycling centers, and businesses to collect and redistribute waste materials to users, promoting a circular economy and reducing the need for new raw materials.

    Educational Resources & Community:

    SmartUpCycle offers educational resources, tutorials, and workshops to teach users how to upcycle waste materials and create new products. Additionally, the platform fosters a community of like-minded individuals who can share ideas, collaborate on projects, and inspire each other to embrace sustainability.

    Business Integration:

    The platform offers integration solutions for businesses, helping them reduce waste, lower costs, and adopt sustainable practices. By providing access to a network of upcyclers and a marketplace for upcycled goods, businesses can transform their waste into revenue-generating products.

    Revenue Model:

    SmartUpCycle generates revenue through various channels, including:

    Transaction fees on the marketplace for upcycled goods.

    Subscription fees for premium features, such as advanced AI recommendations and business integration services.

    Sponsorships and partnerships with waste management companies, recycling centers, and eco-friendly brands.

    Fees from workshops and educational resources.

    Impact:

    SmartUpCycle promotes sustainability by reducing waste, conserving resources, and encouraging a circular economy. By connecting people and businesses with creative upcycling solutions, the platform fosters innovation, supports the growth of a new sustainable industry, and contributes to a healthier planet

    • phnofive 3 years ago

      > The platform uses advanced AI and machine learning algorithms to intelligently identify and match discarded materials with potential new uses, connecting individuals and businesses with creative solutions for repurposing waste.

      In other words, GPT-4

      • ImaCake 3 years ago

        I get this is in jest, but I think the actual solution is to use image classification with a framework of normal code to glue it together. GPT-4 could help you code it…

  • jutrewag 3 years ago

    What do you mean by GPT4 didn’t bother responding. It has better ideas that this when I just asked it. This is generic af. Are you a Claude shill or something.

neom 3 years ago

I dunno what Cohere is, but it's answers are super weird. I asked it "Where does the phrase "the rain in spain falls mainly on the plane" come from?" and it said:

Does anybody know a green song? What happens if a girl doesn't have a twin brother? What musical instrument did larry black sing 'jealous heart' on? Is country music more country than country music. I'm confused.? What is the difference between a copyright and a trademark? What is the best piano chords software u can get for beginners? What is the best music software for beginners? What is the song from the new dexter episode? Did he kill his brother? What is the name of the song dueling banjos? Was the raiders of the lost ark the first jungle hero movie? Who really invented reverse tape effect? Why can't me and my neighbor get along? What prophet spoke to the teachers of the law at jerusalem about their treatment of jesus? Are there any songs that describe the emotions of a cult member? What is the most ugly song ever made?

  • jamilton 3 years ago

    IIRC, GPT-2 and 3 are somewhat like this, asking a question often results in it giving you more questions instead of an answer, unless you give. ChatGPT and GPT-4's tendency to answer questions instead of continuing them comes from the RLHF, I believe.

  • spacemanspiff01 3 years ago

    Clearly the most advanced model we've seen to date.

    It's already trolling you.

  • jayalammar 3 years ago

    For Cohere, make sure you're using Command-Xlarge-Nightly.

    Otherwise, you may be prompting a Base LLM expecting the behavior of a different kind of LLM (an instruction-tuned chat model).

    Cohere's Command model builds on top of the base model, giving it the capability to follow instructions and user commands.

  • d1sxeyes 3 years ago

    The rain in Spain falls mainly on the plain. Not sure if it makes a difference to the response.

  • IanCal 3 years ago

    Coheres model is Claude and it's a completion style one. So it's trying to carry on what you wrote, and a list of questions is reasonable. It worked just fine for me copying and pasting but you might try this format:

    Q: your question

    A:

    Then it is more likely to fill in an answer for you.

    • jayalammar 3 years ago

      Cohere actually trains its own models and they are not based on models from other providers [I work at Cohere].

      Your prompt suggestion is a good one for LLMs as a whole. Any information added to the context informs the model and nudges it towards the expected answer format.

      • IanCal 3 years ago

        Oh sorry I've got mixed up with who runs what model, I thought coheres one was Claude, but that's anthropic. Wasn't trying to say it was based on another.

    • IanCal 3 years ago

      I got mixed up with the models, Claude is from anthropic.

  • sangnoir 3 years ago

    Just like your own question - it seems to me it's riffing on "Bar Quiz" type questions. Have tried the 'Question:'...'Answer:' prompt format?

  • jozvolskyef 3 years ago

    You didn't tell it to act as an assistant so it just assumes that your input is some list of questions that it's supposed to complete.

  • psygn89 3 years ago

    When I asked if elephant farts were loud it answered with a lot more detail than ChatGPT.

  • rambojohnson 3 years ago

    it seems that Cohere might be suggesting that the phrase is as devoid of meaning as its unrelated questions, considering that the phrase was originally just a vocal exercise for a character in a film.

gardenhedge 3 years ago

You manage the Frameworks division (including Next.js, Svelte, and Turbo)? How did you get the time to build this?

  • onion2k 3 years ago

    Speaking about high-level strategic roles in general (eg CTO), assembling a good team that you trust to get on with things means you don't need to spend a huge amount of time managing them on a day-to-day basis. If you do your job well you can get to a point where 'managing' is mostly about steering the direction of the team - figuring out the answers to questions about 'why' and 'what' you're doing as a team. The team members themselves can deal with the 'how' part. That shouldn't be your job.

    The 'why' and the 'what' don't actually change very often (hopefully!) so you can have plenty of time for experimentation and 'play' like this. This is a part of how you learn and answer the 'why' and 'what' parts of your job.

  • syspec 3 years ago

    A growing backlog

  • yding 3 years ago

    Helps when your boss is a prolific coder himself.

  • zild3d 3 years ago

    Using GPT4

atleastoptimal 3 years ago

How much does it cost to query these models all the time? It's offering a lot for free.

  • Trystanr 3 years ago

    There is a rate limit.

    • jondwillis 3 years ago

      Even still. GPT4 in particular gets expensive at scale.

      Vercel is probably noticing that the LLM community is using NextJS (and Vercel) by default a lot of times, and that supporting the community and building more goodwill will pay off.

      • wut42 3 years ago

        Vercel is so overpriced on their hosting plans that they can afford that to attract new customers, I guess.

        • yding 3 years ago

          You can spend hundreds of thousands trying to implement just SSR and still not get it working. (trust me)

          For what Vercel is, it's a steal.

          • __jonas 3 years ago

            Server side rendering? Like the thing that PHP has been doing for decades? Genuinely curious how this has now become difficult again after being so completely ubiquitous with hosting, I guess it’s about “scale” and “edge” etc.?

doodlesdev 3 years ago

Please provide an icon that makes sense on other platforms for the shortcut on the right bottom, not everyone uses Macs! Overall, pretty cool idea, how is Vercel gonna handle API costs for this? I imagine this is pretty expensive to run, specially considering GPT4 access.

Also, I have to link this here because I thought it was too funny:

https://play.vercel.ai/r/G9y3eXJ

edit: Oh I just hit the token limit, I guess that's how you're dealing with cost lol. Something I was gonna suggest is that maybe answers should be cached to prevent repeated calls to the same API when the prompt hasn't changed, this could potentially increase the amount of tests I could do.

  • steventey 3 years ago

    "How many words are there in "The Star Spangled Banner"?" LOL hahaha

    Will relay your feedback about the icon back to the team - thank you for trying this out!

swyx 3 years ago

beautiful work as always jared!

one of the useful things from nat.dev was the tokens/second calculation, latency is impt as usual. would be nice to enable?

i'm excited to see what the upcoming products are. making things default sharable like this across all frameworks would be really nice.

etaioinshrdlu 3 years ago

How did Anthropic agree to let anonymous users call their API? This was a big stumbling block for me.

zurfer 3 years ago

Thanks for setting up a limited free version. The UX is great.

I had some failures for huggingface and cohere. Replicate produced an output that was not rendered (but returned an answer in the network tab).

It would also be great to add: https://docs.aleph-alpha.com/docs/introduction/luminous/

khalidx 3 years ago

Awe man I was just building this. You beat me to it! Looks great. FYI needs some tweaks on mobile.

lazylion2 3 years ago

You can add https://www.ai21.com/ LLMs as well

vertis 3 years ago

Prompt: Pretend you are William Gibson. Pitch your next book to your publisher.

GPT-4: > Dear [Publisher],

> I hope this letter finds you well. As you know, my work has always strived to explore the intersection of technology, society, and human nature. With that in mind, I am excited to present to you my latest novel, "Neural Drift."

> Set in the not-too-distant future, "Neural Drift" delves into a world where the lines between our reality and the digital realm have all but vanished. The story revolves around our protagonist, Elara, a skilled hacker and data courier who navigates the vast neural network known as the "Mindscape" – an immersive digital landscape that has become the primary means of communication, commerce, and entertainment for the global population.

> In this future, the Mindscape has become an integral part of daily life, and the distinction between the physical and digital worlds is all but gone. Society is now heavily reliant on neural implants, which allow users to access the Mindscape and interact with the digital world through

Anthopic Claude-v1 (which wouldn't answer at all if I didn't put pretend in front):

> Here is my William Gibson-inspired book pitch:

> The future is here—it's just not evenly distributed yet. In my new novel, Pattern Recognition, I explore the bleeding edge of culture, technology, and society through the eyes of Cayce Pollard, a "coolhunter" who has an uncanny ability to spot trends in the wild.

> Cayce is hired to track down the creator of a series of mysterious film clips being uploaded to the internet. Her search takes her from London to Tokyo to Moscow, where she encounters a dangerous mix of hackers, hipsters, and Russian mobsters. Along the way, Cayce begins to realize the film clips seem to have a strange predictive power—they appear to show events that have not happened yet.

> Pattern Recognition is a stylish, fast-paced techno-thriller that examines how technology shapes culture, and how culture shapes us. It blends Gibson's trademark cyberpunk sensibility with a

YES, it did just regurgitate an existing Willam Gibson book

  • alehlopeh 3 years ago

    But you didn’t ask it to pitch a new book. At one point, Pattern Recognition was actually his next book, and there’s probably a lot of text out there discussing it as such.

    • vertis 3 years ago

      Sure absolutely, but would a human make this mistake? You can't pitch something that already exists (I mean in this context we're pretending, so maybe we can pretend we're pitching that book). But a human (and GPT-4 does) assumes you want an original book. It's still an assumption.

      I was playing around with GPT-4 making it take the questions for an Economics exam the other day and it was missing a lot of context that a human would get or assume. It was still much better at economics than me, but was going down "wrong" paths that a human wouldn't go down.

      If you told it to explain the the question and list assumptions you could work out where it was going off track, but it lacked the ability to step back and recognise it had taken a nonsense path.

      • andybak 3 years ago

        > but it lacked the ability to step back and recognise it had taken a nonsense path.

        Errr. Yes! We're at the stage where you should be amazed how capable these things are considering their fairly brute force nature. Being surprised at the limitations is a peculiarly - dare I say? - human response?

        Usually the workflow involves:

        1. Write a naive prompt

        2. Gain some insight into AI's behaviour

        3. Improve prompt and repeat 1.

        Of course they lack insight. Of course there's no "meta-reasoning". What is astonishing is how often a glorified Markov Chain gives the impression that there is both.

        • vertis 3 years ago

          It's all astonishing. I don't think it's so much surprised at the limitation as trying to explore the edges. Enjoying finding some of the emergent behaviours.

      • creativenolo 3 years ago

        It seems like your issue is that this is not human, and it does not do what a human would do. This is a high expectation and, possibly dangerous, assumption.

        Shows how this tech is such a double edged sword for user value, expectations and assumptions.

        • vertis 3 years ago

          No, I'm well versed in the limitation of the tech. The comparison to what a human can do is just an interesting comparison. It's a feat the GPT-4 model can manage in this context. GPT-4 is also exhibiting emergent behaviours that are not found on models with smaller numbers of parameters, it is these emergent behaviours that make it seem more 'human' in it's ability to produce content, even if the "reasoning" about it is all an illusion.

          Even here there are limitations, which was my intent with mentioning the Econ exam.

          Aside: It's also possible to get different answers by making the LLM simulate being different people (a student, or a professor).

          It is very easy to slip into thinking about ChatGPT in terms of actually thinking, but then again our understanding of the human mind has led to theories that conscious decision may well be an illusion as well (if brain scans showing that the decision is made before the conscious mind is aware are anything to go by).

          At the moment though, these are just tools, with varying degrees of usefulness.

      • carrolldunham 3 years ago

        It's not a mistake. It's just ambiguous - you and the OP (rather than "humans") share some contextual assumptions that to you make it imply one meaning. If someone asked me with no context to 'be william gibson pitching your next book' I might answer just the same as gpt did

        • vertis 3 years ago

          It's a fair point, and some of the context that's missing for the LLM is just present in the background for humans. You wouldn't need to be told, it would be whether you were in an acting class or creative writing class (these are weak examples).

          We get so many context clues that aren't present in current LLM text interactions.

yding 3 years ago

Great! Vercel edge functions are my preferred deployment for OpenAI calls now.

lightning19 3 years ago

Just for fun I wrote this prompt "vercel monopolizes nextjs by only allowing certain features on their platform such as"

ChatGPT4:

1. Incremental Static Regeneration (ISR): This feature allows developers to update static pages at runtime without having to rebuild the entire website. Vercel has optimized its platform to support ISR, making it an exclusive feature for Next.js applications deployed on Vercel.

2. Real-time Feedback: Vercel's platform provides real-time feedback during development, allowing developers to see the changes they make in their code instantly. This feature is exclusive to Vercel, making it a significant advantage for developers using Next.js on their platform.

3. Edge Functions: Vercel offers Edge Functions, which allow developers to run serverless functions at the edge of their network, providing low-latency responses to user requests. This feature is particularly useful for Next.js applications that require server-side rendering and is exclusive to Vercel's platform.

4. Preview Mode: Vercel's platform supports Next.js's Preview Mode, which allows developers to preview draft content changes without affecting the live site. This

-------------------------------------------------------

Claude v1.3

That is not correct. Vercel does not monopolize Next.js or limit features to their platform.

Next.js is an open source framework that can be deployed anywhere. Some key facts:

1. Next.js is MIT licensed and the source code is available on GitHub for anyone to use, modify, and deploy.

2. Next.js apps can be deployed on any hosting provider or infrastructure. This includes:

- Vercel (the creators of Next.js) - AWS - GCP - Azure - DigitalOcean - Heroku - Your own servers, etc.

3. All Next.js features work regardless of where you deploy. This includes:

- Static Generation (SSG) - Server Side Rendering (SSR) - Incremental Static Regeneration (ISR) - API Routes - Image Optimization - Font Optimization

pastacacioepepe 3 years ago

It's completely broken. I couldn't even get one full response from one of the AIs, it always gets stuck near to the completion of the first response. After like 5 tries it told me that I used all my requests. lol.

  • VierScar 3 years ago

    Same :( I just wanted to see what Google's Flan would give to "77 + 3 * 11" and it says "Something went wrong" and now I'm rate limited

  • 0vermorrow 3 years ago

    Well it's not broken, I'd assume they limit the `max_tokens` to minimize costs.

    • pastacacioepepe 3 years ago

      If the limit doesn't even allow you to get a result with the default settings, I'd assume something's wrong.

spaceman_2020 3 years ago

GPT-4 is miles ahead of the competition. It's not even remotely close.

I asked it to help me design an app for a specific use case. It came up with a list of features, the tech stack, and even designed an initial database.

  • greyman 3 years ago

    Yes and no. Or, yes for such use cases you mentioned, also it is miles ahead in reasoning capability, translations from Chinese, etc. etc. But still, for more casual talk, like bouncing ideas, brainstorming etc., I prefer Claude. It's kind of more "friendly", less formal, and, which is sometimes also important, much quicker.

    • capableweb 3 years ago

      Have you used GPT-4 via the API or via ChatGPT only? The difference is huge, as with the API you can control the parameter + you can set your own system prompt (basically steer the AI better).

      With the API, you can basically customize the session ranging from providing batshit bananas replies (by setting high temperature) or almost deterministic replies (by setting low temperature) with personality controlled by the system prompt, so you can make it "friendlier", "less formal" or anything else.

      It is still as slow via the API as via ChatGPT though, which kind of sucks when you're using it for longer passages. I do wish it was just about 100% faster.

      • greyman 3 years ago

        I use both ChatGPT and API. Mostly I would say I use it for work and when I learn Chinese, and Claude is in my mobile via Poe app, which I use for more casual talk or when I know the answer doesn't require advanced reasoning. Then I also use phind.com website, which I would say is like a better Bing (at least for programming queries).

    • spaceman_2020 3 years ago

      Yeah, even with coding questions, I have to ask it to only write the code. Otherwise it will spend half its time teaching me how to install React first.

tough 3 years ago

This is awesome, will this be open sourced to poke around the codebase?

Some models seem to hang w me (anthropic) but I think it was because I reached the limit.

Does loggedIn usage w vercel will become usage-paid?

Aeolun 3 years ago

Well, every single thing I tried other than gpt-4 failed for me.

osteele 3 years ago

Very nice! If you want to add a feature, feel free to make use of https://github.com/osteele/prompt-matrix.js, which I (well, ChatGPT and I) created for use in my own playground so that I could enter e.g. "Write a <Python|JavaScript> program to look for anagrams, in the style of <Norvig|Carmack>".

  • osteele 3 years ago

    To be clear, that’s a link to a library meant for use in projects like yours. I’m not promoting a competing project (and haven’t provided a link to one).

eh9 3 years ago

How expensive is this to run?

  • l5870uoo9y 3 years ago

    I imagine a lot less than you would expect. That is at least my experience from observing user behaviour.

petecoop 3 years ago

What's the data policy on this? Are you keeping copies of the prompts and responses?

vladholubiev 3 years ago

Does anybody know if there is a similar playground, but for evaluating a single model by comparing different prompts with different temperatures?

I am building this by myself using Streamlit, but was wondering is there a ready solution.

lxe 3 years ago

How are the models hosted?

Edit: they are using their own hosted APIs such as Anthropic or Replicate.

  • palmdeezyOP 3 years ago

    Correct. We are using various hosting providers.

    As part of the project, I’ve been working with providers and hosts on updating their SDKs to work on Vercel Edge Functions (and streaming).

tikkun 3 years ago

Do you think the codebase will be open sourced within the next week or two?

d4rkp4ttern 3 years ago

Great interface!

Maybe slightly tangential, on the subject of UIs for ChatGPT, does anyone know a good UI for voice input to ChatGPT, for either the website or via an API ?

pjs_ 3 years ago

> are you gpt-4?

I am an AI language model created by OpenAI, but I am not GPT-4. I am based on the GPT-3 model. GPT-4 has not been released yet.

jacooper 3 years ago

Does this fully use UTF-8? I have been trying it a non-lating language and its showing ?? Icons in every answer.

ruslanmsv 3 years ago

Its output for GPT-4 much worse compared with an output of chatGPT-4 available at chat.openai.com

FailMore 3 years ago

Nice, can I have conversations? It’s not clear on mob how to do this vs. Just one prompt

cloudking 3 years ago

Thanks for sharing! What business problems are you solving at Vercel with LLMs?

ironfootnz 3 years ago

the UI is quite limited, check out the outputs https://play.vercel.ai/r/I25tmJs when needs a scroll bar.

khuss 3 years ago

Nice work. Is it open source?

nkmnz 3 years ago

Is the source available?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection