Palantir and GAIA-X

10 min read Original article ↗

Palantir

In November 2020, Palantir joined GAIA-X as a proud Day 1 Member. GAIA-X was envisioned as data infrastructure and an open digital ecosystem “initiated by Europe, for Europe” to support the global competitiveness of European companies. So it’s reasonable to ask why a company like Palantir Technologies, founded in Silicon Valley with global headquarters in Denver, Colorado, should consider its participation in the project as important, appropriate, and consistent with the stated goals of promoting European “data sovereignty and data availability.”

Press enter or click to view image in full size

Palantir builds digital infrastructure for secure data-driven operations and decision-making. Since our founding in the early 2000s, we have focused our business and technology on addressing many of the challenges that GAIA-X is taking head on: How can public and private sector institutions leverage the distributed, heterogeneous information assets they have accumulated over decades or centuries? How can organizations use data effectively without compromising data security, privacy, and civil liberties interests of affected communities? The values at the core of the GAIA-X project are, indeed, central to our own mission as a company: data protection, data security, and digital sovereignty of the institutions we support and constituencies they serve.

GAIA-X’s values are situated in complex social contexts, institutional histories, and cultural norms that must be navigated and addressed with sensitivity and precision. In over 17 years of work in some of the most sensitive — and skeptical — data environments in the world, we have acquired understanding of the ways that intricate technological solutions can and should be interwoven with the broader data governance and normative contexts in which the technology must operate. This is precisely the sweet spot of our experience and expertise, which we hope to contribute to GAIA-X and European data ecosystems over the coming years. That’s why we decided to join GAIA-X.

In this post, we share our view of the some of the technical challenges facing GAIA-X in enabling secure and collaborative data ecosystems. We then explore how we’ve helped our customers solve these problems and the organizational and cultural challenges we’ve faced along the way.

Technical challenges of collaborative data ecosystems

Collaboration in data ecosystems manifests at different scales. Within technical teams, data scientists and developers collaborate on data ingestion scripts, data pipeline code, or model training and testing. They need tools that help them manage the complexity of ever-changing data and code. Collaboration between technical users (e.g., data engineers) and less technical users (e.g., business SMEs) benefits from seamless integration of machine-readable and human-readable data models, for instance via object-oriented, semantic data models and ontologies. (See Self-Descriptions, Schemata, Objects, and Properties in the GAIA-X Technical Architecture document for an example.)

Collaboration between different business functions in the same organization (e.g., finance and engineering groups) or between different organizations as a whole (e.g., different government departments) has even greater challenges: Where is data stored and catalogued? What is the data schema? How does data stay up-to-date? How does change and release management work for ever-evolving data products? How can complex data integrity constraints be specified and enforced? Who has access to what data and for what purpose? All of these questions intersect with and reinforce key concerns of data governance. We anticipate that GAIA-X — or, more generally, data architectures based on federation and sovereign data control — will further amplify the importance of core governance concepts such as provenance-aware access controls, data usage policies, or auditing.

These problems hit quite close to home for us as they were exactly some of the early challenges we had to solve for when building our products, for instance:

  • We built a dynamic ontology system to harmonize data from a variety of source systems in disparate formats (which is often best suited for machines) into business concepts that humans can work with, with a particular focus on non-technical users.
  • We implemented systems that track lineage and provenance of data flows, along with elaborate data quality monitoring systems to give confidence to the consumer of the data that it is high quality and can be trusted.
  • Taking inspiration from software engineering, we brought concepts like versioning, branching, and release management to the data engineering world. The goal is that an organization’s entire workforce can collaborate on the same data foundation without sacrificing proper quality and change management controls.

The roots of Palantir lie in our work for the national security community wherein data security, governance, and institutional sovereignty/independence controls must be respected and demonstrably used. Our software engineering culture has deeply internalized these concepts, just as we’ve built strong operational, technical, and procedural safeguards to ensure that each of our customer environments remain separate, sovereign, and secure from all others.

Secure collaboration in the aviation and automobile sectors

Our work helping institutions build data ecosystems has surfaced the challenges and benefits of secure, controlled collaboration within and across organizational boundaries.

Get Palantir’s stories in your inbox

Join Medium for free to get updates from this writer.

One example is Skywise, the digital ecosystem for the entire aviation industry, provided by Airbus and powered by Palantir’s software. Airlines around the world use Skywise to improve operational efficiency and safety, and to prevent delays. Suppliers can, for example, optimize inventory and reduce late and non-quality deliveries, and Airbus can accelerate production and develop new business models.

By bringing diverse — even competing — organizations together in an ecosystem, Skywise faces an especially high barrier to trust. We solved this in part by deploying a federated data integration architecture, in which participants can manage their federated private space and perform peer-to-peer data exchange and sharing based on purpose.

A second example of the value of cross-organizational collaboration is our work with a European vehicle manufacturer. The manufacturer sought to use our software to improve the quality and safety of vehicles by exchanging data more effectively with its top suppliers. Most new vehicles are “connected,” with multiple sensors collecting thousands of readings per second. Combined with data on production and logistics, this information is valuable for upstream part suppliers aiming to shorten investigations of faults, improve part design, and increase safety and quality.

However, connected vehicle data can contain PII and geo-location information, and thus is generally subject to the highest levels of data protection. In fact, sharing this data is so difficult that most participants in the automotive value chain tend to collaborate in analog ways. For example, it’s more common for a dealer to mail a faulty part to its manufacturer to investigate rather than sharing its corresponding part performance data from in-service cars. We deployed our software to change this; to create an ecosystem for secure data sharing and collaboration that reduced the time to investigate a fault. Eventually, this generated millions of dollars in cost savings, but it took a lot of convincing and education to establish that the platform’s data sharing mechanisms satisfied each party’s standards for control and data protection.

Data governance in public health

In the healthcare space, we have partnered with Merck KGaA to build Syntropy. Syntropy provides a trust-based environment that simplifies and accelerates collaboration-driven insights in the fight to solve cancer.

Most recently, we have worked with healthcare organizations such as the NHS and the NIH responding to the COVID-19 pandemic. To manage the spread of COVID-19, healthcare organizations have needed to rapidly bring together data from many systems — testing programs, care homes, and hospitals — and give thousands of users, from healthcare workers to academics, access to different subsets of this information. Tracking who has access to what information and why, across thousands of datasets and thousands of users quickly became an intractable problem for data governance teams. Together with our partner organizations, we established a purpose-based access control protocol that captures not only which user has access to what data, but also for what specific purpose they are allowed to use the requested data, and for what period of time.

All of these examples illustrate Palantir’s approach to technology development for complex, real-world environments: we aim to build systems that enable and empower data controllers and stewards to use their data effectively while also protecting their sovereignty and preserving the privacy of data subjects to which they are accountable.

The problem and promise of GAIA-X: building a fast track to trust

So what did we learn from building our software platforms and deploying them with our partners to create data ecosystems? Each ecosystem brings its own idiosyncratic challenges: technical, commercial, organizational, and cultural. They share, however, the single biggest obstacle: the absence of trust in a secure mechanism for information exchange and collaboration.

Without trust, most time and energy is spent overcoming institutional hurdles that stem from an understandable reluctance to use technology without demonstrable safeguards and safe harbors. Without trust, those who build the ecosystem have to prove over and over again that the proposed mechanisms for data sharing and collaboration satisfy the standards of control and data protection of all parties involved.

But what if there were a fast track to trust that allowed us to spend time solving problems and creating value instead of (re)negotiating mutual trust? This is one way to interpret GAIA-X: a fast track to trust that tackles the technical and organizational challenges that impede it. The big promise of GAIA-X is to establish a framework for institutions to find common cultural ground and lay out standards and procedures for data management.

The foundation of such a framework is necessarily not technical but institutional: an open, democratic community built by a variety of different political and economic stakeholders around shared goals and values. Once established, the framework makes it possible to develop and deploy technologies that address discrete needs in ways that are verifiable and reinforce the core governance practices and principles. We are excited to be part of this community!

Overcoming apparent contradictions

Our co-founder and CEO, Alex Karp, has often stressed that Palantir aims to empower European institutions, which he admires for their tradition, social cohesion, and immense professional expertise. Our geographical footprint and workforce is a reflection of this cultural alignment: London has become our largest Foundry development hub in 2018, and we have satellite offices in many European countries, including Paris and Munich, staffed primarily by EU citizens who share a common regard for the ambition of GAIA-X and similar efforts. Our goal is not to disrupt European businesses, but to accompany them as partners on their digital transformation journey. In this respect, a federated, open data infrastructure will be a huge step forward, and we are grateful for the opportunity to help GAIA-X create this infrastructure.

Palantir was founded on the conviction that effective use of data need not come at the expense of the preservation of privacy and civil liberties. The GAIA-X vision feels to us like a natural extension of this conviction: how can we foster integrated digital ecosystems at scale and for a wide range of different stakeholders (from SMEs to the major European champions) while also protecting digital sovereignty and fundamental rights? Some may once again see a contradiction between cooperation on one hand and security on the other, between data sovereignty and data availability. The task for us computer scientists and software engineers is to disprove this dichotomy. For us, this prospect is both an opportunity to make use of our engineering investments to date, and an awesome challenge for the years to come.

Authors
Harkirat Singh is based in London and is a senior technical lead in Palantir’s European private sector business. Robert Fink is the founding engineer of Palantir’s Foundry data platform and lives and works in Munich, Germany (although he is of course Nordisch by Nature).