Sharing is Caring: Unleash the Rare Disease Data Silos

Admirable advances in genetic medicine have recently stoked hopes that precision therapeutics may be on the way for patients living with rare diseases. But in reality, much of the data collected and generated is parked in proprietary databases.

It’s a practice that — driven by strong academic and economic incentives — eschews open access and thwarts the building of collective knowledge. Partitioning data also reduces the timely development of successful therapeutic products or interventions.

Sadly, when researchers and clinicians can’t or won’t share their data, it’s the patients who ultimately suffer.

We think this system needs to change.

Don’t you?

The trouble with silos

Researchers and clinicians primarily use patient registries, natural history studies, and patient-focused data collection methods to generate information that helps in understanding the clinical manifestations of rare diseases. But there are limitations to studying what are, by definition, small, variable, and widely dispersed patient populations. Small patient populations also pose challenges for design and power principles that comprise the requirements for clinical development of drugs.

Today’s incentives lead researchers and clinicians in academia and industry to keep their databases to themselves. This protects funding streams and development of intellectual property, but therapeutic advances are unfortunately slowed.

But while owners of data silos may feel they protecting knowledge of value, an unintended consequence of this practice may be overvaluation: possessing a few fragments of a biological puzzle likely does not hold as much value as the whole, connected picture.

How silos affect patients

When thought of in singular fashion, patients living with rare diseases may number few per capita. But in aggregate, rare diseases  affect an estimated 3.5–5.9% of all people living today. Many, though not all, rare diseases disproportionately affect children; which means that it’s the children and their families who lose the most in our current way of doing business.

When these patients participate in medical research, they are sharing their most precious resources: their time, their limited energy, and specimens from their bodies. They share because they care: about advancing medical and biological understanding of their condition, about helping to find a cure, and about helping others.

But when data from medical research is held under lock and key, these patients may find themselves participating in multiple studies that duplicate efforts. For patients, this frustrating and confusing scenario can lead to a lack of trust in medical research — even a lack of will to continue participating.

Possibly remedies

Linking and standardizing data sets would promote a more detailed and integrated understanding of how rare diseases manifest — from genetic encoding to physical expression — which would in turn benefit an underserved patient population. Sharing data would also promote the development of novel analytics, outcome measures, and tools that may improve research and clinical care.

How might we get there? The US National Institutes of Health could lead meetings to address incentives across industry and academia. Congress might also require research that uses public funding to be standardized and shared.

Artificial intelligence techniques, such as machine learning, could also be used to leverage and link multiple data sources, though careful planning is advised. For example, groups that work across many different rare diseases have recently attempted to rapidly expand patient registries, and their experiences have exposed the risks of worsening the problems of data loss and replication as well as the perils of varying standard practices.

Looking forward

We might look to tuberous sclerosis complex (TSC) as an example of what a data-sharing future may hold. A foundation advocating for patients with this disease worked with academics to generate a longterm natural history registry, with funded from industry. Data from the registry was then used in a prospective clinical trial which supporters hope will improve cognition and behaviors in children living with TSC.

The US Food and Drug Administration updated its recommendations on natural history research focused on rare diseases in March 2019, priority review of products for rare pediatric diseases in July 2019, and gene therapy for rare diseases in January 2020. But critics say even these advances are limited by fragmented data silos at the bottom of the process and lack of binding regulations and standard practices at the top.

As data is generated from the bottom up, and regulated from the top down, let’s not forget what holds the middle ground: the patients, and their loved ones, who live with these rare diseases. Children, adolescents, and adults who hold hope in their hearts and precious resources in their bodies —  these are the people who must ultimately benefit from medical research in a timely way, for the process to be meaningful, ethical, and just.


Denton N, Molloy M, Charleston S, et al. Data silos are undermining drug development and failing rare disease patients. Orphanet J Rare Dis. 2021;16(1):161. Published 2021 Apr 7. doi:10.1186/s13023-021-01806-4

US National Library of Medicine. Long-term, Prospective Study Evaluating Clinical and Molecular Biomarkers of Epileptogenesis in a Genetic Model of Epilepsy – Tuberous Sclerosis Complex (EPISTOP). Available at Accessed May 4, 2021.

US Food and Drug Administration. Rare Diseases: Natural History Studies for Drug Development

Draft Guidance for Industry. March 2019. Available at Accessed May 4, 2021.

US Food and Drug Administration. Human gene therapy for rare diseases. Draft Guidance for Industry. January 2020. Available at Accessed May 4, 2021.