Generative AI analyzes medical data faster than human research teams
- Date:
- February 21, 2026
- Source:
- University of California - San Francisco
- Summary:
- Researchers tested whether generative AI could handle complex medical datasets as well as human experts. In some cases, the AI matched or outperformed teams that had spent months building prediction models. By generating usable analytical code from precise prompts, the systems dramatically reduced the time needed to process health data. The findings hint at a future where AI helps scientists move faster from data to discovery.
- Share:
In an early real world test of artificial intelligence in health research, scientists at UC San Francisco and Wayne State University discovered that generative AI could process enormous medical datasets far faster than traditional computer science teams -- and in some cases produce even stronger results. Human experts had spent months carefully analyzing the same information.
To compare performance directly, researchers assigned identical tasks to different groups. Some teams relied entirely on human expertise, while others used scientists working with AI tools. The challenge was to predict preterm birth using data from more than 1,000 pregnant women.
Even a junior research pair made up of a UCSF master's student, Reuben Sarwal, and a high school student, Victor Tarca, successfully developed prediction models with AI support. The system generated functioning computer code in minutes -- something that would normally take experienced programmers several hours or even days.
The advantage came from AI's ability to write analytical code based on short but highly specific prompts. Not every system performed well. Only 4 of the 8 AI chatbots produced usable code. Still, those that succeeded did not require large teams of specialists to guide them.
Because of this speed, the junior researchers were able to complete their experiments, verify their findings, and submit their results to a journal within a few months.
"These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines," said Marina Sirota, PhD, a professor of Pediatrics who is the interim director of the Bakar Computational Health Sciences Institute (BCHSI) at UCSF and the principal investigator of the March of Dimes Prematurity Research Center at UCSF. "The speed-up couldn't come sooner for patients who need help now."
Sirota is co-senior author of the study, published in Cell Reports Medicine on Feb. 17.
Why Preterm Birth Research Matters
Speeding up data analysis could improve diagnostic tools for preterm birth -- the leading cause of newborn death and a major contributor to long term motor and cognitive challenges in children. In the United States, roughly 1,000 babies are born prematurely each day.
Researchers still do not fully understand what causes preterm birth. To investigate possible risk factors, Sirota's team compiled microbiome data from about 1,200 pregnant women whose outcomes were tracked across nine separate studies.
"This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers," said Tomiko T. Oskotsky MD, co-director of the March of Dimes Preterm Birth Data Repository, associate professor in UCSF BCHSI, and co-author of the paper.
However, analyzing such a vast and complex dataset proved challenging. To tackle this, the researchers turned to a global crowdsourcing competition called DREAM (Dialogue on Reverse Engineering Assessment and Methods).
Sirota co-led one of three DREAM pregnancy challenges, focusing specifically on vaginal microbiome data. More than 100 teams worldwide participated, developing machine learning models designed to detect patterns linked to preterm birth. Most groups completed their work within the three month competition window. Yet it took nearly two years to consolidate the findings and publish them.
Testing AI on Pregnancy and Microbiome Data
Curious whether generative AI could shorten that timeline, Sirota's group partnered with researchers led by Adi L. Tarca, PhD, co-senior author and professor in the Center for Molecular Medicine and Genetics at Wayne State University in Detroit, MI. Tarca had led the other two DREAM challenges, which focused on improving methods for estimating pregnancy stage.
Together, the researchers instructed eight AI systems to independently generate algorithms using the same datasets from the three DREAM challenges, without direct human coding.
The AI chatbots received carefully written natural language instructions. Much like ChatGPT, the systems were guided through detailed prompts designed to steer them toward analyzing the health data in ways comparable to the original DREAM participants.
Their objectives mirrored the earlier challenges. The AI systems analyzed vaginal microbiome data to identify signs of preterm birth and examined blood or placental samples to estimate gestational age. Pregnancy dating is almost always an estimate, yet it determines the type of care women receive as pregnancies progress. When estimates are inaccurate, preparing for labor becomes more difficult.
Researchers then ran the AI generated code using the DREAM datasets. Only 4 of the 8 tools produced models that matched the performance of the human teams, although in some cases the AI models performed better. The entire generative AI effort -- from inception to submission of a paper -- took just six months.
Scientists emphasize that AI still requires careful oversight. These systems can produce misleading results, and human expertise remains essential. However, by rapidly sorting through massive health datasets, generative AI may allow researchers to spend less time troubleshooting code and more time interpreting results and asking meaningful scientific questions.
"Thanks to generative AI, researchers with a limited background in data science won't always need to form wide collaborations or spend hours debugging code," Tarca said. "They can focus on answering the right biomedical questions."
Authors: UCSF authors are Reuben Sarwal; Claire Dubin; Sanchita Bhattacharya, MS; and Atul Butte, MD, PhD. Other authors are Victor Tarca (Huron High School, Ann Arbor, MI); Nikolas Kalavros and Gustavo Stolovitzky, PhD (New York University); Gaurav Bhatti (Wayne State University); and Roberto Romero, MD, D(Med)Sc (National Institute of Child Health and Human Development (NICHD)).
Funding: This work was funded by the March of Dimes Prematurity Research Center at UCSF, and by ImmPort. The data used in this study was generated in part with support from the Pregnancy Research Branch of the NICHD.
Story Source:
Materials provided by University of California - San Francisco. Note: Content may be edited for style and length.
Journal Reference:
- Reuben Sarwal, Victor Tarca, Claire A. Dubin, Nikolas Kalavros, Gaurav Bhatti, Sanchita Bhattacharya, Atul Butte, Roberto Romero, Gustavo Stolovitzky, Tomiko T. Oskotsky, Adi L. Tarca, Marina Sirota. Benchmarking large language models for predictive modeling in biomedical research with a focus on reproductive health. Cell Reports Medicine, 2026; 7 (2): 102594 DOI: 10.1016/j.xcrm.2026.102594
Cite This Page: