Comprehensive resource describes functions of more than 20,000 human genes
- Date:
- February 26, 2025
- Source:
- Keck School of Medicine of USC
- Summary:
- A comprehensive encyclopedia of the known functions of all protein-coding human genes has just been completed and released. Researchers used large-scale evolutionary modeling to integrate data on human genes with genetic data collected from other organisms. This has culminated in a searchable public resource that lists the known functions of more than 20,000 genes using the most accurate and complete evidence available.
- Share:
A new resource from the Gene Ontology Consortium, a comprehensive encyclopedia of the known functions of all protein-coding human genes, has just been completed and released on a new website. For the first time, researchers from the Keck School of Medicine of USC, the Swiss Institute of Bioinformatics and other institutions used large-scale evolutionary modeling to integrate data on human genes with genetic data collected from other organisms. This has culminated in a searchable public resource that lists the known functions of more than 20,000 genes using the most accurate and complete evidence available. A paper describing the resource was just published in the journal Nature.
The Gene Ontology, a National Institutes of Health-funded knowledge base that has been continually expanded and improved for more than 25 years, has become a mainstay of the biomedical research process. Already, it is used in more than 30,000 publications each year to aid with data analysis and interpretation.
Biomedical researchers who conduct "omics" experiments -- large-scale studies of DNA, RNA, proteins and other biological molecules -- generate data that can identify hundreds of genes of interest. For example, a researcher might learn which genes are turned "on" or "off" in cancerous cells compared to healthy ones. Reviewing thousands of published papers on the known functions of each gene is not feasible, so many scientists turn instead to the Gene Ontology.
"Our knowledge base allows scientists to go from just a list of genes to an understanding of their biological functions, including what might be useful for treatment," said Paul D. Thomas, PhD, a principal investigator of the Gene Ontology Consortium and director of the division of bioinformatics and a professor of population and public health sciences at the Keck School of Medicine.
Now, this latest milestone provides a new resource within the knowledge base that uses evolutionary modeling to make the tool even more powerful. The approach allows the researchers to combine experimental data collected from human genes with that obtained from related genes in model organisms, such as mice and zebrafish. It provides a more complete picture of human gene function, including filling in gaps in scientific knowledge where direct evidence from human studies is not available.
"We'd previously amassed a huge knowledge base that has become an authoritative reference on human gene functions," said Thomas, who is also lead author of the new publication. "And now, by adding information about when each function arose in evolution, we're now providing an even more complete, accurate, and concise description of the functions encoded by human genes."
An evolutionary view
The new resource was compiled by a team of more than 150 biologists around the world, including at the Keck School of Medicine of USC. Since 1998, the group has meticulously reviewed over 175,000 scientific publications on gene function, searching for data on gene functions in well studied organisms and every gene in the human genome -- primarily the more than 20,000 protein-coding genes that control key biological processes.
After reviewing the literature, they categorized each gene according to the biological functions it performs, either on its own or in combination with other genes. They selected from a catalog they developed of more than 40,000 functions that span cell division, cell signaling, immune response, molecular transport and many more. Understanding the precise functions performed by groups of genes can help researchers understand what goes wrong in cancer and other diseases and design targeted approaches to treatment.
The new resource of gene function descriptions, called the "PAN-GO functionome," will essentially be used in the same way by the scientific community -- to analyze omics data among other applications -- but it will yield more accurate results, Thomas said. That's because the recent work has brought together all the information in the knowledge base using large-scale evolutionary models (which track the evolutionary history of thousands of genes and related proteins), creating a more complete and accurate picture of gene function.
In many cases, experimental data from human genes is not available, but scientists have studied related genes in mice, rats, zebrafish, fruit flies, yeast or E. coli. By understanding when and how specific functions (such as energy processing or cell signaling) evolved, researchers can use data obtained from other organisms to gain an understanding of gene function in humans.
"This helps us infer the functional characteristics of human genes, even when there is no direct evidence from an experiment on the human gene itself," Thomas said.
Further improving the knowledge base
Going forward, the Gene Ontology Consortium is requesting that researchers use the PAN-GO functionome in their analyses. The information is structured in a machine-readable format that allows scientists to use computational tools, such as artificial intelligence, to quickly search and use the data.
The consortium is also issuing a call to action: Researchers can now submit suggestions for updating the knowledge base on specific genes through the project's website. Crowd-sourcing knowledge of gene functions and categorizing them in a structured way ensures that the shared resource continues to improve over time and that its insights are easy to apply.
Though it is the most comprehensive resource available on gene functions, the PAN-GO functionome is not yet complete. It contains data on 82% of protein-coding genes, but no experimental data exists for the other 18% -- roughly 3,600 genes, the biological functions of which remain unknown.
"We now have a real picture of where we are missing information, and that's where future research in this area may want to focus," Thomas said.
In addition to Thomas, the study's other authors are Huaiyu Mi, Anushya Muruganujan, Dustin Ebert and Tremayne Mushayahama from the Department of Population and Public Health Sciences, Keck School of Medicine of USC, University of Southern California; Marc Feuermann and Pascale Gaudet from the Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland; and Suzanna E. Lewis from Lawrence Berkeley National Laboratory, Berkeley, California; as well as more than 150 contributors in the Gene Ontology Consortium from about 50 institutions worldwide.
This work was primarily supported by the National Institutes of Health [U24HG002273, U24HG012212].
Story Source:
Materials provided by Keck School of Medicine of USC. Note: Content may be edited for style and length.
Journal Reference:
- Marc Feuermann, Huaiyu Mi, Pascale Gaudet, Anushya Muruganujan, Suzanna E. Lewis, Dustin Ebert, Tremayne Mushayahama, Suzanne A. Aleksander, James Balhoff, Seth Carbon, J. Michael Cherry, Harold J. Drabkin, Nomi L. Harris, David P. Hill, Raymond Lee, Colin Logie, Sierra Moxon, Christopher J. Mungall, Paul W. Sternberg, Kimberly Van Auken, Jolene Ramsey, Deborah A. Siegele, Rex L. Chisholm, Petra Fey, Michelle Giglio, Suvarna Nadendla, Giulia Antonazzo, Helen Attrill, Nicholas H. Brown, Phani V. Garapati, Steven Marygold, Saadullah H. Ahmed, Praoparn Asanitthong, Diana Luna Buitrago, Meltem N. Erdol, Matthew C. Gage, Siyao Huang, Mohamed Ali Kadhum, Kan Yan Chloe Li, Miao Long, Aleksandra Michalak, Angeline Pesala, Armalya Pritazahra, Shirin C. C. Saverimuttu, Renzhi Su, Qianhan Xu, Ruth C. Lovering, Judith Blake, Karen Christie, Lori Corbani, Mary E. Dolan, Li Ni, Dmitry Sitnikov, Cynthia Smith, Manuel Lera-Ramirez, Kim Rutherford, Valerie Wood, Peter D’Eustachio, Wendy M. Demos, Jeffrey L. De Pons, Melinda R. Dwinell, G. Thomas Hayman, Mary L. Kaldunski, Anne E. Kwitek, Stanley J. F. Laulederkind, Jennifer R. Smith, Marek A. Tutaj, Mahima Vedi, Shur-Jen Wang, Stacia R. Engel, Kalpana Karra, Stuart R. Miyasato, Robert S. Nash, Marek S. Skrzypek, Shuai Weng, Edith D. Wong, Tilmann Achsel, Maria Andres-Alonso, Claudia Bagni, Àlex Bayés, Thomas Biederer, Nils Brose, John Jia En Chua, Marcelo P. Coba, L. Niels Cornelisse, Jaime de Juan-Sanz, Hana L. Goldschmidt, Eckart D. Gundelfinger, Richard L. Huganir, Cordelia Imig, Reinhard Jahn, Hwajin Jung, Pascal S. Kaeser, Eunjoon Kim, Frank Koopmans, Michael R. Kreutz, Noa Lipstein, Harold D. MacGillavry, Peter S. McPherson, Vincent O’Connor, Rainer Pielot, Timothy A. Ryan, Carlo Sala, Morgan Sheng, Karl-Heinz Smalla, A. B. Smit, Ruud F. Toonen, Jan R. T. van Weering, Matthijs Verhage, Chiara Verpelli, Erika Bakker, Tanya Z. Berardini, Leonore Reiser, Andrea Auchincloss, Kristian Axelsen, Ghislaine Argoud-Puy, Marie-Claude Blatter, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Cristina Casals-Casas, Elisabeth Coudert, Anne Estreicher, Maria Livia Famiglietti, Arnaud Gos, Nadine Gruaz-Gumowski, Chantal Hulo, Nevila Hyka-Nouspikel, Florence Jungo, Philippe Le Mercier, Damien Lieberherr, Patrick Masson, Anne Morgat, Ivo Pedruzzi, Lucille Pourcel, Sylvain Poux, Catherine Rivoire, Shyamala Sundaram, Emily Bowler-Barnett, Hema Bye-A-Jee, Paul Denny, Alexandr Ignatchenko, Rizwan Ishtiaq, Antonia Lock, Yvonne Lussi, Michele Magrane, Maria J. Martin, Sandra Orchard, Pedro Raposo, Elena Speretta, Nidhi Tyagi, Kate Warner, Rossana Zaru, Juancarlos Chan, Stavros Diamantakis, Daniela Raciti, Malcolm Fisher, Christina James-Zorn, Virgilio Ponferrada, Aaron Zorn, Sridhar Ramachandran, Leyla Ruzicka, Monte Westerfield, Paul D. Thomas. A compendium of human gene functions derived from evolutionary modelling. Nature, 2025; DOI: 10.1038/s41586-025-08592-0
Cite This Page: