Although European countries have centralized healthcare systems, patient data is distributed across many different databases. In leveraging AI to improve and tailor treatments, tech companies struggle with getting training data for their algorithms. Drawing from the experience of three startups, this article describes a five-step process for consolidating the data.
When it comes to leveraging AI, more than 80% of European pharma executives believe that Europe is behind the curve in general — and that European pharma is further behind than most other big-data industries. These are among the findings of a recent survey INSEAD conducted in partnership with Early Metrics (a European startup-rating agency) and Agalio, a Paris-based consulting firm.
Many observers blame Europe’s tough laws regarding data privacy, which are more stringent than those in the United States and China. But our survey suggests that regulation isn’t the problem. Rather, it’s a disconnect between private and public health systems, which makes it difficult for AI programmers to access enough data with which to train their algorithms.
In many European countries, clinical data is kept by individual hospitals and clinics despite the existence of centralized public health systems. That wouldn’t be a problem were it not also the case that hospitals and clinics all use different databases and IT systems. They decide for themselves what types of data to collect and in what format to store it, and there’s no clear framework for how it should be anonymized and shared.
As a result, health care companies may not have access to large centralized data “lakes” that contain the right amount of the right type of data in the right format for deploying AI software, which may ultimately compromise patients’ access to better quality healthcare.
How can they get around the problem? The encouraging experience of three startups I studied offers a roadmap.
Step 1: Figure out who has the keys.
To get access to clinical data, the first port of call for a health care AI startup is usually an internal committee of doctors, legal staff, and hospital administrators, who determine whether access should be granted in principle. If the hospital belongs to a public system, the request also needs approval from a relevant health authority. Then there’s the question of gaining access itself. Although the physical data is usually located on just one or two servers, accessing the various stores of it — user IDs, passwords, and so forth — requires the cooperation of many different people, who are not always easy to identify and contact. Owkin, a Franco-American startup that deploys AI for clinical development, faced this challenge at each hospital it worked with. In some cases, the cofounders — a hematologist/oncologist and a computer scientist — initially had to camp out at the hospital and knock door-to-door. In one hospital, the process took more than three weeks.
Step 2: Find a scrubbing partner.
Hospitals, especially those in a public health system, are usually resource-constrained, and managing data is not often a priority. To make a hospital’s data usable, Owkin would bring in its own computers and deploy its own data managers to clean up and standardize the data. This required heavy investments in time and money — not an easy proposition for most startups. To share the setup costs in one hospital pilot, Owkin partnered with a large pharma company, giving it access to the hospital data in return. In other cases, hospitals have supplied the necessary resources in exchange for free use of the AI solution or potential future revenue from selling the AI solution to pharma companies.
Step 3: Train the algorithm serially.
Algorithms trained on small quantities of data do not work well, so hospitals must combine their data “ponds” to create a lake. But hospitals can be as reluctant as corporations are to share their data; they don’t want to make it too easy for patients to move to other hospitals, and they have concerns about confidentiality. To get around the problem, Owkin trained the algorithm from one site’s data, which it standardized; it then retrained the algorithm with the standardized data at its next site, and so on, until all the data had been used to train the algorithm. This new approach provides an interesting alternative to the open API and cloud solutions championed by tech giants such as Microsoft and IBM. It might also accelerate the creation of data-sharing cooperatives.
Step 4: Ensure regulatory compliance.
Hospital data often contains confidential information, such as birth dates and personal addresses. As regulations evolve, hospitals and pharmaceutical companies expect their partners to keep up with compliance in terms of protecting such data. Owkin and Nabta Health, a Dubai-based women’s health startup, usually build an encryption layer on top of the raw data set (creating statistic noise) to hide sensitive patient information while keeping dynamism. Nabta Health also developed blockchain-based technology that enables patients to manage their personal health data and tracks how the data is shared by physicians and hospitals.
Step 5: Connect to other kinds of data.
Nabta Health and ExactCare — a French startup that uses AI to personalize medication dosages — are developing AI that can tap into patients’ data from other sources, such as wearables that capture information on body temperature, heart rate, sweat, and movement. The data thus obtained can be fed to an AI algorithm that hospitals and clinics can use to tailor individual care paths for their patients. (In China, companies such Alibaba and JD are taking a similar approach in developing digital “twins” of their online customers that draw on data from many domains.)
AI has the potential to fundamentally change health care practices, provided we can integrate the distributed data in the world’s health ecosystems. What the likes of Owkin, ExactCare, and Nabta are achieving in Europe could serve as a useful template for similar initiatives in other markets — most notably in Asia and Latin America — that are facing the same challenges.