Flagship AI-ready dataset released in type 2 diabetes study
Early results suggest broader participant diversity and novel measures will enable new, artificial intelligence-driven insights
- Date:
- November 8, 2024
- Source:
- University of Washington School of Medicine/UW Medicine
- Summary:
- Researchers today are releasing the flagship dataset from an ambitious study of biomarkers and environmental factors that might influence the development of type 2 diabetes. Because the study participants include people with no diabetes and others with various stages of the condition, the early findings hint at a tapestry of information distinct from previous research. All of these data are intended to be mined by artificial intelligence for novel insights about risks, preventive measures, and pathways between disease and health.
- Share:
Researchers today (Nov. 8, 2024) are releasing the flagship dataset from an ambitious study of biomarkers and environmental factors that might influence the development of type 2 diabetes. Because the study participants include people with no diabetes and others with various stages of the condition, the early findings hint at a tapestry of information distinct from previous research.
For instance, data from a customized environmental sensor in participants' homes show a clear association between disease state and exposure to tiny particulates of pollution. The collected data also include survey responses, depression scales, eye-imaging scans and traditional measures of glucose and other biologic variables.
All of these data are intended to be mined by artificial intelligence for novel insights about risks, preventive measures, and pathways between disease and health.
"We see data supporting heterogeneity among type 2 diabetes patients -- that people aren't all dealing with the same thing. And because we're getting such large, granular datasets, researchers will be able to explore this deeply," said Dr. Cecilia Lee, a professor of ophthalmology at the University of Washington School of Medicine.
She expressed excitement at the quality of the collected data, which represent 1,067 people, just 25% of the study's total expected enrollees.
Lee is program director of AI-READI (Artificial Intelligence Ready and Equitable Atlas for Diabetes Insights). The National Institutes of Health-supported initiative aims to collect and share AI-ready data for global scientists to analyze for new clues about health and disease.
The initial data release is highlighted in a paper published Nov. 8 in the journal Nature Metabolism. The authors restated their aim to gather health information from a more racially and ethnically diverse population than has been measured previously, and to make the resulting data ready, technically and ethically, for AI mining.
"This process of discovery has been invigorating," said Dr. Aaron Lee, also a UW Medicine professor of ophthalmology and the project's principal investigator. "We're a consortium of seven institutions and multidisciplinary teams that had not worked together before. But we have shared goals of drawing on unbiased data and protecting the security of that data as we make it accessible to colleagues everywhere."
At study sites in Seattle, San Diego, and Birmingham, Alabama, recruiters are collectively enrolling 4,000 participants, with inclusion criteria promoting balance:
- race/ethnicity (1,000 each -- white, Black, Hispanic and Asian)
- disease severity (1,000 each -- no diabetes, prediabetes, medication/non-insulin-controlled and insulin-controlled type 2 diabetes)
- sex (equal male/female split)
"Conventionally scientists are examining pathogenesis -- how people become diseased -- and risk factors," Aaron Lee said. "We want our datasets to also be studied for salutogenesis, or factors that contribute to health. So if your diabetes gets better, what factors might be contributing to that? We expect that the flagship dataset will lead to novel discoveries about type 2 diabetes in both of these ways."
By collecting more deeply characterizing data from a lot of people, he added, the researchers hope to create pseudo health histories of how a person might progress from disease to full health and from full health to disease.
Hosted on a custom online platform, the data are produced in two sets: a controlled-access set requiring a usage agreement, and a registered, publicly available version stripped of HIPAA-protected information.
The pilot data release (summer 2024) involving 204 participants has been downloaded by more than 110 research organizations worldwide. Researchers must verify their identity and agree to ethical-usage terms. (Learn more about accessing the data at aireadi.org.)
The AI-READI Consortium comprises the University of Washington School of Medicine, University of Alabama at Birmingham, University of California San Diego, California Medical Innovations Institute, Johns Hopkins University, Native Biodata Consortium, Stanford University and Oregon Health & Science University.
The project is based at the Angie Karalis Johnson Retina Center at UW Medicine in Seattle. Cecilia Lee holds the Klorfine Family Endowed Chair. Aaron Lee holds the Dan and Irene Hunter Endowed Professorship.
This work was supported by the NIH (grants OT2OD032644 and P30 DK035816).
Story Source:
Materials provided by University of Washington School of Medicine/UW Medicine. Note: Content may be edited for style and length.
Journal Reference:
- Sally L. Baxter, Virginia R. de Sa, Kadija Ferryman, Prachee Jain, Cecilia S. Lee, Jennifer Li-Pook-Than, T. Y. Alvin Liu, Julia P. Owen, Bhavesh Patel, Qilu Yu, Linda M. Zangwill, Amir Bahmani, Christopher G. Chute, Jeffrey C. Edberg, Samantha Hurst, Hiroshi Ishikawa, Aaron Y. Lee, Gerald McGwin, Shannon McWeeney, Camille Nebeker, Cynthia Owsley, Sara J. Singer, Riddhiman Adib, Mohammad Adibuzzaman, Arash Alavi, Catherine Ashley, Adrienne Baer, Erik Benton, Marian Blazes, Aaron Cohen, Benjamin Cordier, Katie Crist, Colleen Cuddy, Aydan Gasimova, Nayoon Gim, Stephanie Hong, Trina Kim, Wei-Chun Lin, Jessica Mitchell, Caitlyn Ngadisastra, Victoria Patronilo, Jamie Shaffer, Sanjay Soundarajan, Kevin Zhao, Caroline Drolet, Abigail Lucero, Dawn Matthies, Hanna Pittock, Kate Watkins, Brittany York, Charles E. Amankwa, Monique Bangudi, Nada Haboudal, Shahin Hallaj, Anna Heinke, Lingling Huang, Fritz Gerald P. Kalaw, Apoorva Karsolia, Hadi Khazaei, Muna Mohammed, Kyongmi Simpkins, Xujing Wang. AI-READI: rethinking AI data collection, preparation and sharing in diabetes research and beyond. Nature Metabolism, 2024; DOI: 10.1038/s42255-024-01165-x
Cite This Page: