Settings

Theme

Show HN: We're building a desktop app for browser-based AI agents

meha.ai

50 points by jawerty a year ago · 61 comments · 2 min read

Reader

What's up HN!

This is Jared and Art. We met on HN and started building together.

Over the last few months we've been thinking a lot about how AI agents are going to impact the future. We want agents to be something that's actually useful for normal people as well as the 10x'ers. This lead us to building Meha over the last few months, our first swing at our vision! We saw OpenAI release Operators then we said f*k it let's post.

Meha is a desktop app that uses your Chrome browser to execute tasks in the background. It controls your installed Chrome browser and uses LLMs with playwright to plan and execute actions to accomplish your task. You get to see each planning step the bot is doing and have access to its long term memory.

Meha also uses its own file system and can export files for download. Another thing we've been focused on in multi-agent workflows and Meha can run many bots at the same time. One of the reasons why we can ship this for free in the mean time is because of how cheap the agents are. But we are planning to have a Pro version for power users. We prefer not to raise since we're against VC funding.

We have been influenced by a lot of concepts in probabilistic robotics and RL to develop a fairly robust 'agentic' framework. As well as an algorithm for efficiently converting/compressing large html pages into a semantic format. If you're interested we will open source this asap in an SDK (will work with all OpenAI API spec LLMs and with llama.cpp) let us know.

We're currently in beta and working on figuring out what this product will become and super stoked! Let us know what you think. To get access to Meha we have links on our discord to download (Both MacOS and Windows is available). Please give us all the feedback/criticism (even if you hate AI).

Link to Meha: https://meha.ai

stormfather a year ago

> As well as an algorithm for efficiently converting/compressing large html pages into a semantic format.

For the love of humanity please open source this. This seems tremendously useful by itself.

  • pavelfeldman a year ago

    There is an open source alternative that might be even better: https://playwright.dev/docs/api/class-locator#locator-aria-s....

  • jawertyOP a year ago

    Oh damn I will definitely look into open sourcing it and making it a sdk

    • stormfather a year ago

      Awesome! I write LLM powered scrapers and stuff all the time and one of the biggest pain points is HTML is full of so much crap that isn't meaningful and overwhelms the context. And being a data science guy idk how to solve this.

      • jawertyOP a year ago

        awesome that's the same reason why I use it. It's basically a balance between the full html and having the markdown type scrapers that are better for just text. Do you mind if I reach out to you once I set up the Github?

        • stormfather a year ago

          You're very welcome to! Please do. You can reach out to notpricedinyet@gmail.com

skeeter2020 a year ago

Looked through their privacy policy, and they state the collect and use basically everything they can from your browser & system metadata, to the content you share and/or create. Not that different from every other attempt in the frothy AI space, but a real turn-off and hard no for me.

  • jawertyOP a year ago

    Thank you for the feedback. Personally besides using our API server, we would like to find another way to deploy to anyone who has an issue with this/wants to run everything local (not just the client). Also I think if we had a OSS plug and play version where you could enter in your API keys locally it would help us ship to more devs. Would you be interested in this?

    • imarkphillips a year ago

      I'm so impressed with the concept of this agent but sorry, I can't have you accessing all my corporate data and systems because I access them via browser.

      Perhaps you could create both a Public and Corporate version of the extension, like Copilot does. The Corporate version could have access to all browser data but not share it beyond the bounds of the company.

      • jawertyOP a year ago

        Thanks! That’s a great point we’ve been discussing how to deal with sensitive data after the launch. I think a corporate/enterprise version makes sense.

    • 1shooner a year ago

      Some analysis I've been reading on the implications of DeepSeek says that model optionality is probably here to stay. If so, I think incorporating model choice would be a valuable aspect of this kind of product. Conversely, I agree with parent: I'm not installing this software with that privacy policy in place.

      • artabra a year ago

        We definitely wouldn't mind adding that. We are open to a lot of ideas and will consider everything! Appreciate your input so thank you.

  • idiotsecant a year ago

    Op, any comment on this?

Kuinox a year ago

I asked it to go on seloger.com, to find "some flats on paris below 400k". It went on some specific district of Paris, and didn't put a price citeria then responded how I could do it myself.

I then asked to create a CSV of the first 100 flats corresponding to my criteria, it created only 3 entries, purely hallucinated.

  • artabra a year ago

    We'll take a look and see if we can get those prompts working. Thanks for letting us know!

arjunchint a year ago

Hey I am also building in the space and launched rtrvr.ai, but we went the route of a Chrome Extension so people don't have to worry about installing random software on their devices [also the reason that I am hesitant to try this out].

But let me know your thougths on rtrvr.ai, looks like we are targeting the same use cases of automation, scraping, research?

artabra a year ago

Hi everyone, this is Art!

Happy to hear all the thoughts for those who try the app out! Even if you just have ideas about how agents might look in their final form, there's so many avenue's this tech can take and we have a ton of wild ideas we'll be building so stay tuned. :D

iiJDSii a year ago

Very cool! Any video demos for sample tasks? I didn't come across any on the website (browsing on mobile).

  • artabra a year ago

    Those are still in the cooker, we'll throw them up asap once they're ready.

    Some demos we will have are:

    - Logging into twitter and tweeting

    - Finding information from google maps of any nearby business whether that's for leads or finding local restaurant options.

    - Scraping anything from wikipedia like current events etc.

    - And more!

    • iiJDSii a year ago

      Those are good ones. I've fiddled with similar systems before, do you have a rough success rate? I know they can be finicky, especially as you execute through a chain-of-thought action plan, or however you're doing it.

      • jawertyOP a year ago

        Anything improving reasoning chains of though improves planning. Right now the long term ones Art mentioned like logging in have been around 80% while simpler ones have been higher. Right now our main issue is figuring out how to keep the server up :/ we're getting a little more traffic than expected. However, to bump those success rates up (which we need to) we really really need to fine tune additional models which we're planning out right now.

        I have a few ideas around that mostly going down the RL route (with a twist) mixed with some knowledge graph work. We'll give an update when we push that!

        • iiJDSii a year ago

          > keep the server up

          Oh maybe I didn't understand from the site - it's not a standalone desktop app? What processing do you do on your server side?

          • jawertyOP a year ago

            We have an API server where we execute all the agent reasoning/planning jobs then we stream the browser commands to the client. We mention this in the how it works section on the website. This is the main reason why we have the 5 bot a day limit is because of this. It's cheap for us to run as of now but if anyone would like us to ship a version where you'd use your own api keys (plug n play) locally let us know!

sky2224 a year ago

Interesting idea. With the web scraping utility, do I need to specify which websites I wish for the api to scrape from or do I essentially just say, "hey I want this data, go get it"?

If it's the latter, how do you go about making sure you're not about to download malicious data to my machine?

  • jawertyOP a year ago

    Great question, so right now you can do both. It does work better if you simply enter in the url for your task.

    For the url generation we do we have safety checks for the urls however it's simply in the prompting. I would love to hear what sort of safety suggestions you have and/or concerns about this sort of experience. Right now we're still figuring out how best to enable people to utilize agents safely.

xnx a year ago

Cool. Is this a wrapper for https://github.com/browser-use/browser-use ?

  • jawertyOP a year ago

    It is not, Meha agent is fully custom except we don’t use our own models we’re using o3-mini for most of the inference

noahfk a year ago

it kept taking me to non existent websites that were a summary of what i asked it for

  • jawertyOP a year ago

    Thanks for letting us know can you email us at info@meha.ai or dm one of us on discord with more information we're working through all the bug reports in the next couple days.

pdntspa a year ago

I would be very interested in your research on compressing HTML pages!

  • jawertyOP a year ago

    Great! I will work on open sourcing that on our Github. It's basically a semantic format of html for AI agents to use the browser easily.

delduca a year ago

Is it a native app or electron based?

  • jawertyOP a year ago

    This is a python QT app (for now) we're lookin to move to electron however packaging this has been...interesting.

    • cpursley a year ago

      Please please please don’t move to electron and just build something native. Electron desktop software bloat is killing our machines.

      • jawertyOP a year ago

        Packaging a browser runtime for a chat app is a concern when the base amount of resources is far more than what we need. I'm more concerned about dev community + what runtime we'd prefer managing local browser in. I'm looking to Go frameworks right now (I'm naturally moving away from python to Go personally) if anyone has any suggestions

      • dgfitz a year ago

        I would be so happy if the migration went from pyqt to c++ qt.

        Hell, I’ll help with the conversion.

        • artabra a year ago

          We will probably be open sourcing the frontend soon and you can hack on it all you want. :D

    • delduca a year ago

      Please stick with Qt, I have ditched all non native and electron apps from my machine (the last replacement was VSCode to Zed)

      • jawertyOP a year ago

        Interesting can you lmk why you ditched all non native? We're discussing what decision to make on this now.

    • jwrallie a year ago

      So the reason for no Linux support is non technical right? There are literally dozens of us!

      An easy way to scrape webpages is something I’m interested in, I promise I’ll try it when it’s supported.

      • jawertyOP a year ago

        Oh please do report that on either the discord or email us. We had a few people request linux support; trying to log all the feedback. It's really just time limitations right now no prejudice :)

xena a year ago

How do I uniquely identify Meha to block it?

  • jawertyOP a year ago

    Sooo make your html extremely convoluted, randomized semantics, and a ton of hidden interations (+1 for only using custom web elements). basically make it like youtube. After spending way too much time building browser agents I can assure you this will also defeat Operator as well.

    • xena a year ago

      Can you email me at the email in my profile? I'd like to talk with you.

    • orange_puff a year ago

      When making requests, does your tool use the normal chrome user agent header or does it specify the request is coming from meha?

phdelightful a year ago

Since you asked for “all the feedback,” there’s a typo on your landing page:

“The Meha API utilizes it's home-grown” -> “its”

Also, I got a relay access denied error when I tried to email you at info@meha.ai

  • jawertyOP a year ago

    Awesome we just fixed these issues thanks for letting us know.

anticorporate a year ago

The headline needs another word. Maybe browser-based agents. I assumed this was about browser user-agents.

dotancohen a year ago

Irrespective of the product, I think that you could have posted without the pseudo cussing. Having a little respect for your audience, and trying to appear professional, goes a long way in attracting users.

  • garbagewoman a year ago

    your view is not in line with cultural norms, though. if nytimes best selling book titles regularly use "pseudo cussing" (as you call it), no reasonable person would see this as disrespectful, especially given the much more casual context it was used.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection