Swept under the carpet for too long, handling privacy is now a problem on many of the software architects’ and developers’ todo lists. A new breed of start-ups is starting to emerge to tackle this problem. We discuss the problem at its core as well as the right approach to adopt when solving it.
There are conflicting forces that clash over your system’s design
In software design, we often talk about conflicting forces that define a design problem, solved by a particular design pattern. What are precisely those forces in the case of Privacy?
On one hand:
- personal user data is a goldmine for functionality. A truly useful end-user application naturally collects and treats personal user data. In fact, the more there is, the more useful the application is for the user. Users and applications “speak” through the exchange of user data. Whatever end-user software that you are building, be it an e-commerce website, banking application, or a social network; your software is made to collect and treat user data and often its purpose is even defined by that data. In addition,
- intelligence thrives on user data. Artificial intelligence systems rely on user data to learn intelligent behavior and deliver, smart, personalized services to the users. And users want intelligent applications.
In other words, it is unthinkable to deliver a modern end-user application that users want without collecting, storing, and treating user data. It is hard to imagine an application that minimizes the amount of data collected and stored. This is our first “force” exerting pressure on your software design.
On the other hand (the second “force”):
- collecting and storing personal data at a central place exposes your system to security and privacy risks that become your system’s responsibility. Others might want to know what your system “knows” about its users. Even the most unexpected software systems are targets of attacks and data breaches.
- It is also more difficult to establish and maintain trust i.e. of the ability of users to believe that you or the employees of the business that actually runs your system, are not reading the personal user data. Serious internal policies are put in place by major software companies (e.g. Google) to address these concerns, however the ability of users to actually trust that the policies fully protecting them is getting weaker and weaker with every new user data-related scandal.
- There are legal requirements to minimize the data being collected and stored, and to protect it at transit and at rest. GDPR in Europe is the most talked-about legal framework for protecting online privacy, and a headache for developers across the globe (as it protects EU citizens even from privacy-hindering deeds of foreign companies). But a similar legal protection exists in California (CCPA) and elsewhere in the world.
The modern system-user relationship requires not only the protection of user data from unintended external parties but also — to a certain degree — the protection from the system itself.
Yet, the classic client-server architecture has, for decades, been shifting towards a reduced responsibility of client, thus making end-user applications that are often responsible only for collecting user input and displaying responses from a centralized server.
Press enter or click to view image in full size
While such an approach ensures a certain elegance and a meaningful distribution of responsibilities of system components, it also amplifies the conflict of forces that we describe here. Systems architects have thus been obliged to choose between useful functionality and privacy, or worse: to apply patches in terms of policies or security solutions external to the code that often make usability horrible. Yet a modern user has increased expectations, is more aware of privacy risks, and rightfully expects both a high level of useful functionality and of privacy, not to mention usability.
We present here the elements of software architecture that can solve this conflict of forces, that we have derived from emerging modern applications that managed to successfully overcome it.
Architecture Choices for Privacy by Design
1 Separation: Known and Unknown User Data
The first thing to do is to separate, and consider differently, the user data that your system needs to “know” from the user data that the system can collect and treat without actually having access to it. We call those two categories: Known and Unknown User Data.
There is no magic recipe for this separation as it depends on your system’s functionality. There are systems where all user data can be considered Unknown — where the system has no knowledge of the user to whom it delivers its functionality (anonymous services such as Tor, are good examples).
Yet, most systems need to identify the user in order to make him pay for a service that they deliver. A ride sharing platform might want to consider the identity of riders and drivers “known” but the destination of the ride and any messages exchanged between drivers and riders “unknown”. A hotel booking platform might perform the split in such a way to connect users with hotels, get a fee from hotels, but ignore the dates of the booking that reveal the user’s whereabouts.
Get Milan Stankovic, PhD’s stories in your inbox
Join Medium for free to get updates from this writer.
Once you segment which user data to treat as “known” and which as “unknown” you can adopt a new flavor of client-server architecture — the one when you treat the “known” data as you normally would, but where the “unknown” user data is kept at the user’s endpoint; end-to-end encrypted when exchanged between users via your system, and encrypted at backup.
Press enter or click to view image in full size
2 End-to-end Encrypted Transfers
Whenever your system needs to move the unknown user data from one user to another, you can use end-to-end encryption to ensure those transfers (and related functionalities) happen without your system being able to read the actual user data being exchanged.
Press enter or click to view image in full size
For this you can inspire from the well-established Signal protocol. At blindnet we are working to provide you with easy-to-integrate components to make your software end-to-end encryption-ready.
End-to-End encryption is becoming more and more ubiqutous in end-user applications as the necessirty to ensure trust is growing. Keep an eye on what blindnet.io is doing to make its implementation easy for every developer in every application.
3 Encrypted Storage
The risk with “unknown” user data is that any central database backup that you have put in place for the “known” user data, won’t be enough. In a likely scenario that your system is exposed to an ‘unknown’ data set, you will need to set up an encryption such that only the user, on his endpoint, can decrypt the backup.
You may use one of the following existing and elegant solutions to achieve this, e.g. Tanker.io.
An important consideration here is what happens with the backup if the encryption key gets lost due to the loss of the user’s endpoint. The data “unknown” to the system would then be forever lost. It realy depends on your use-case how you could best deal with the issue. End-to-end encrypted chat applications are leading the way in engineering ways in which user endpoints can authentify one another, so that one user can have multiple ways to access her data.
4 Zero-knowledge proofs
When dealing with certain user information that should remain “unknown”, zero-knowledge proofs can be used to achieve a high-degree of functionality while allowing a great deal of user data to remain “unknown” to your system. Typically, legal-age verification can be elegantly achieved in this way.
You can learn about zero-knowledge proofs and open-source tools here.
5 Process Data Encrypted or at the Edge
When designing software, the major challenge is with any intelligent/personalized feature. Traditionally, such features have been developed by maintaining a centralized data storage over which machine learning algorithms can operate. To meet the modern users’ expectations, a tendency is emerging to shift certain computations from the centralized servers closer to the user’s endpoints (hence “edge computing”).
The idea behind “edge computing” is to split the computing in a similar way as we split data into “known” and “unknown”, and make computation points closer to the user perform tasks and provide services on behalf of the cloud. Indeed, machine learning algorithms can still operate centrally and learn parameter associations by using anonymized user data. On-Device Machine Intelligence is an approach that is gaining ground for both improved privacy and latency. Have a look at open source projects such as MobileNet and Learn2Compress.
Personalized computations can then be performed on the user’s endpoint, on top of “unknown” data, to deliver intelligent behavior and smart functionalities to the user.
Processing user data in a privacy-preserving way is also a vibrant challenge, giving way to many nover ideas and approaches: keep an eye on what Evervault is building.
A Whole New Way of Looking at the Day
Combining the Separation of “known” form “unknown” user data, end-to-end encrypted transfers, encrypted storage, zero-knowledge proofs, and edge-computing yields a whole new way of building robust, functional, yet private-by-design software systems.
This design is not new to software architects, they have been willfully oblivious. In fact, the expectation of privacy in a software system is one of the most basic and most natural expectations to have from a software system (especially a commercial one).
While many will try and sell privacy as a response to fears — fears of being spied upon by over-controlling governments, fears of being treated as an advertising target, fears of being exposed in a data breach and other fears; I think that such a view is dangerous and that it is doing a disservice to software design. I believe that fears are irrational and while people do purchase things out of fear, fear is not a viable motif for a software system design decision.
On the other hand, what I do see, is a fundamental necessity to design a balanced relationship between the user and the software system. I believe that the user and the software system are not one. Before even thinking of third-party intrusions and abuses, the software architect must consider that full disclosure of user’s data to the software system can’t be assumed to be a sane default way of functioning, when the user and the system are not one, but two entities: too entities that are in some sort of relationship. And, as every relationship, the user-system relationship can only survive where there is balance.
Building software with privacy in mind, means first designing this balanced relationship, by limiting the exposure of the system to the user’s data without compromising the system’s functionality. Only after you have designed your system to protect the user from the system itself can you efficiently protect the user from external threats.