Settings

Theme

Show HN: Why AI data should be self hosted

layernext.ai

12 points by bmadduma 2 years ago · 8 comments

Reader

michaelmior 2 years ago

> Show HN is for something you've made that other people can play with.

Unless I'm missing something, this doesn't fall in that category.

r_thambapillai 2 years ago

I think you have every single one of these risks exactly backwards:

Potential for Data Breaches: - The idea that third parties have inherently higher breach risk is pure FUD.

Data Sovereignty and Compliance Risks/Limited Overside: - Having every single company solve all of these problems for every single part of their software stack is obviously very inefficient. That's why we have SOC II, Pen Testing, etc, as other commenters point out. The process of reviewing vendor SOC II reports is way, way easier than having to manage the cost of managing compliance for the hundreds of little productivity apps your team want to use!

Shared Responsibility Model: I think its pretty typical at this point for companies to take ownership over the behaviour of their sub-processors from a risk perspective.

phillipcarter 2 years ago

I don't understand the reasons given. All of these exist for sensitive data today, which is why we have things like SOC II compliance, Data Processing Agreements, etc. Data used for AI is no different.

  • dangerwill 2 years ago

    Because the same big companies (Microsoft, Google, and Facebook) that are offering LLM products today have been shown time and time again to lie about data retention/use/privacy and prefer to just eat fines over complying. That and the ingestion of data into the training data of these systems is crucial to their functioning so there is a reason why these companies are very likely lying with their data processing agreements.

    Adobe still claims that their Firefly product is only trained on publicly licensed stock photo data and yet it has been repeatedly found outputting watermarks from artists that are not supposed to be in that training data.

  • bmaddumaOP 2 years ago

    I appreciate your thoughts. Most of the customers we've talked to have requested a self-hosted version of our platform on their own cloud infrastructure. Regardless of existing agreements like SOC II or Data Processing Agreements, companies are extremely concerned about having their AI data in the hands of third parties. Example, I'm sure that Walmart is not using AWS.

    • phillipcarter 2 years ago

      Walmart is one of the largest customers of Azure. They don’t use AWS because Amazon wouldn’t let them and their subsidiaries use AWS back in the earlier days of cloud adoption. Yes, some things are still on prem, but many (and in some cases most) things aren’t.

rolph 2 years ago

tldr:

privacy, security, control, autonomy, fidelity, integrity, continuity.

no new reasons yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection