AI enables large-scale brain tumor study, without sharing patient data
- Date:
- December 5, 2022
- Source:
- University of Pennsylvania School of Medicine
- Summary:
- Researchers led a large-scale global machine learning effort to securely aggregate knowledge from brain scans of 6,314 glioblastoma (GBM) patients at 71 sites around the globe and develop a model that can enhance identification and prediction of boundaries in three tumor sub-compartments, without compromising patient privacy.
- Share:
Researchers at Penn Medicine and Intel Corporation led the largest-to-date global machine learning effort to securely aggregate knowledge from brain scans of 6,314 glioblastoma (GBM) patients at 71 sites around the globe and develop a model that can enhance identification and prediction of boundaries in three tumor sub-compartments, without compromising patient privacy. Their findings were published today in Nature Communications.
"This is the single largest and most diverse dataset of glioblastoma patients ever considered in the literature, and was made possible through federated learning," said senior author Spyridon Bakas, PhD, an assistant professor of Pathology & Laboratory Medicine, and Radiology, at the Perelman School of Medicine at the University of Pennsylvania. "The more data we can feed into machine learning models, the more accurate they become, which in turn can improve our ability to understand, treat, and remove glioblastoma in patients with more precision."
Researchers studying rare conditions, like GBM, an aggressive type of brain tumor, often have patient populations limited to their own institution or geographical location. Due to privacy protection legislation, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the United States, and General Data Protection Regulation (GDPR) in Europe, data sharing collaborations across institutions without compromising patient privacy data is a major obstacle for many healthcare providers.
A newer machine learning approach, called federated learning, offers a solution to these hurdles by bringing the machine learning algorithm to the data instead of following the current paradigm of centralizing data to the algorithms. Federated learning -- an approach first implemented by Google for keyboards' autocorrect functionality -- trains a machine learning algorithm across multiple decentralized devices or servers (in this case, institutions) holding local data samples, without actually exchanging them. It has been previously shown to allow clinicians at institutions in different countries to collaborate on research without sharing any private patient data.
Bakas led this massive collaborative study along with first authors Sarthak Pati, MS, a senior software developer at Penn's Center for Biomedical Image Computing & Analytics (CBICA), Ujjwal Baid, PhD, a postdoctoral researcher at CBICA, Brandon Edwards, PhD, a research scientist at Intel Labs,and Micah Sheller, a research scientist at Intel Labs.
"Data helps to drive discovery, especially in rare cancers where available data can be scarce. The federated approach we outline allows for access to maximal data while lowering institutional burdens to data sharing." said Jill Barnholtz-Sloan, PhD, adjunct Professor at Case Western Reserve University School of Medicine.
The model followed a staged approach. The first stage, called a public initial model, was pre-trained using publicly available data from the International Brain Tumor Segmentation (BraTS) challenge. The model was tasked with identifying boundaries of three GBM tumor sub-compartments: "enhancing tumor" (ET), representing the vascular blood-brain barrier breakdown within the tumor; the "tumor core" (TC), which includes the ET and the part which kills tissue, and represents the part of the tumor relevant for surgeons who remove them; and the "whole tumor" (WT), which is defined by the union of the TC and the infiltrated tissue, which is the whole area that would be treated with radiation.
This first the data of 231 patient cases from 16 sites, and the resulting model was validated against the local data at each site. The second stage, called the preliminary consensus model, usedthe public initial model and incorporated more data from 2,471 patient cases from 35 sites, which improved its accuracy. The final stage, or final consensus model, used the updated model, and incorporated the largest amount of data from 6,314 patient cases (3,914,680 images) at 71 sites, across 6 continents, to further optimize and test for generalizability to unseen data.
As a control for each step, researchers excluded 20 percent of the total cases contributed by each participating site from the model training process and used as "local validation data." This allowed them to gauge the accuracy of the collaborative method. To further evaluate the generalizability of the models, six sites were not involved in any of the training stages to represent a completely unseen "out-of-sample" data population of 590 cases. Notably, the site at the American College of Radiology validated their model using data from a national clinical trial study.
Following model training the final consensus model garnered significant performance improvements against the collaborators' local validation data. The final consensus model had an improvement of 27% in detecting ET boundaries, 33% in detecting TC boundaries, and 16% for WT boundary detection. The improved result is a clear indication of the benefit that can be afforded through access to more cases, not only to improve the model, but also to validate it.
Looking ahead, the authors hope that due to the generic methodology of federated learning, its applications in medical research can be far-reaching, applying not only to other cancers, but other conditions, like neurodegeneration, and beyond. They also anticipate more research to demonstrate that federated learning can abide by security and privacy protocols around the world.
Funding for this research was provided by the National Institutes of Health (U01CA242871, R01NS042645, U24CA189523, U24CA215109, U01CA248226, P30CA510081231, R50CA211270, UL1TR001433, R21EB0302091232, R37CA214955, R01CA233888, U10CA21661, U10CA37422, U10CA180820,1235U10CA180794, U01CA176110, R01CA082500, CA079778, CA080098, CA180794, CA180820,1236CA180822, CA180868), and the National Science Foundation (2040532, 2040462).
Intel Corporation provided software engineer staff and privacy-protecting expertise to the project, during the development of the utilized software.
Story Source:
Materials provided by University of Pennsylvania School of Medicine. Note: Content may be edited for style and length.
Journal Reference:
- Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G. Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J. Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer, Soonmee Cha, Madhura Ingalhalikar, Manali Jadhav, Umang Pandey, Jitender Saini, John Garrett, Matthew Larson, Robert Jeraj, Stuart Currie, Russell Frood, Kavi Fatania, Raymond Y. Huang, Ken Chang, Carmen Balaña Quintero, Jaume Capellades, Josep Puig, Johannes Trenkler, Josef Pichler, Georg Necker, Andreas Haunschmidt, Stephan Meckel, Gaurav Shukla, Spencer Liem, Gregory S. Alexander, Joseph Lombardo, Joshua D. Palmer, Adam E. Flanders, Adam P. Dicker, Haris I. Sair, Craig K. Jones, Archana Venkataraman, Meirui Jiang, Tiffany Y. So, Cheng Chen, Pheng Ann Heng, Qi Dou, Michal Kozubek, Filip Lux, Jan Michálek, Petr Matula, Miloš Keřkovský, Tereza Kopřivová, Marek Dostál, Václav Vybíhal, Michael A. Vogelbaum, J. Ross Mitchell, Joaquim Farinhas, Joseph A. Maldjian, Chandan Ganesh Bangalore Yogananda, Marco C. Pinho, Divya Reddy, James Holcomb, Benjamin C. Wagner, Benjamin M. Ellingson, Timothy F. Cloughesy, Catalina Raymond, Talia Oughourlian, Akifumi Hagiwara, Chencai Wang, Minh-Son To, Sargam Bhardwaj, Chee Chong, Marc Agzarian, Alexandre Xavier Falcão, Samuel B. Martins, Bernardo C. A. Teixeira, Flávia Sprenger, David Menotti, Diego R. Lucio, Pamela LaMontagne, Daniel Marcus, Benedikt Wiestler, Florian Kofler, Ivan Ezhov, Marie Metz, Rajan Jain, Matthew Lee, Yvonne W. Lui, Richard McKinley, Johannes Slotboom, Piotr Radojewski, Raphael Meier, Roland Wiest, Derrick Murcia, Eric Fu, Rourke Haas, John Thompson, David Ryan Ormond, Chaitra Badve, Andrew E. Sloan, Vachan Vadmal, Kristin Waite, Rivka R. Colen, Linmin Pei, Murat Ak, Ashok Srinivasan, J. Rajiv Bapuraj, Arvind Rao, Nicholas Wang, Ota Yoshiaki, Toshio Moritani, Sevcan Turk, Joonsang Lee, Snehal Prabhudesai, Fanny Morón, Jacob Mandel, Konstantinos Kamnitsas, Ben Glocker, Luke V. M. Dixon, Matthew Williams, Peter Zampakis, Vasileios Panagiotopoulos, Panagiotis Tsiganos, Sotiris Alexiou, Ilias Haliassos, Evangelia I. Zacharaki, Konstantinos Moustakas, Christina Kalogeropoulou, Dimitrios M. Kardamakis, Yoon Seong Choi, Seung-Koo Lee, Jong Hee Chang, Sung Soo Ahn, Bing Luo, Laila Poisson, Ning Wen, Pallavi Tiwari, Ruchika Verma, Rohan Bareja, Ipsa Yadav, Jonathan Chen, Neeraj Kumar, Marion Smits, Sebastian R. van der Voort, Ahmed Alafandi, Fatih Incekara, Maarten M. J. Wijnenga, Georgios Kapsas, Renske Gahrmann, Joost W. Schouten, Hendrikus J. Dubbink, Arnaud J. P. E. Vincent, Martin J. van den Bent, Pim J. French, Stefan Klein, Yading Yuan, Sonam Sharma, Tzu-Chi Tseng, Saba Adabi, Simone P. Niclou, Olivier Keunen, Ann-Christin Hau, Martin Vallières, David Fortin, Martin Lepage, Bennett Landman, Karthik Ramadass, Kaiwen Xu, Silky Chotai, Lola B. Chambless, Akshitkumar Mistry, Reid C. Thompson, Yuriy Gusev, Krithika Bhuvaneshwar, Anousheh Sayah, Camelia Bencheqroun, Anas Belouali, Subha Madhavan, Thomas C. Booth, Alysha Chelliah, Marc Modat, Haris Shuaib, Carmen Dragos, Aly Abayazeed, Kenneth Kolodziej, Michael Hill, Ahmed Abbassy, Shady Gamal, Mahmoud Mekhaimar, Mohamed Qayati, Mauricio Reyes, Ji Eun Park, Jihye Yun, Ho Sung Kim, Abhishek Mahajan, Mark Muzi, Sean Benson, Regina G. H. Beets-Tan, Jonas Teuwen, Alejandro Herrera-Trujillo, Maria Trujillo, William Escobar, Ana Abello, Jose Bernal, Jhon Gómez, Joseph Choi, Stephen Baek, Yusung Kim, Heba Ismael, Bryan Allen, John M. Buatti, Aikaterini Kotrotsou, Hongwei Li, Tobias Weiss, Michael Weller, Andrea Bink, Bertrand Pouymayou, Hassan F. Shaykh, Joel Saltz, Prateek Prasanna, Sampurna Shrestha, Kartik M. Mani, David Payne, Tahsin Kurc, Enrique Pelaez, Heydy Franco-Maldonado, Francis Loayza, Sebastian Quevedo, Pamela Guevara, Esteban Torche, Cristobal Mendoza, Franco Vera, Elvis Ríos, Eduardo López, Sergio A. Velastin, Godwin Ogbole, Mayowa Soneye, Dotun Oyekunle, Olubunmi Odafe-Oyibotha, Babatunde Osobu, Mustapha Shu’aibu, Adeleye Dorcas, Farouk Dako, Amber L. Simpson, Mohammad Hamghalam, Jacob J. Peoples, Ricky Hu, Anh Tran, Danielle Cutler, Fabio Y. Moraes, Michael A. Boss, James Gimpel, Deepak Kattil Veettil, Kendall Schmidt, Brian Bialecki, Sailaja Marella, Cynthia Price, Lisa Cimino, Charles Apgar, Prashant Shah, Bjoern Menze, Jill S. Barnholtz-Sloan, Jason Martin, Spyridon Bakas. Federated learning enables big data for rare cancer boundary detection. Nature Communications, 2022; 13 (1) DOI: 10.1038/s41467-022-33407-5
Cite This Page: