New! Sign up for our free email newsletter.
Science News
from research organizations

A revolutionary DNA search engine is speeding up genetic discovery

Date:
October 28, 2025
Source:
ETH Zurich
Summary:
ETH Zurich scientists have created “MetaGraph,” a revolutionary DNA search engine that functions like Google for genetic data. By compressing global genomic datasets by a factor of 300, it allows researchers to search trillions of DNA and RNA sequences in seconds instead of downloading massive data files. The tool could transform biomedical research and pandemic response.
Share:
FULL STORY

Rare genetic diseases can now be detected in patients, and tumor-specific mutations identified -- a milestone made possible by DNA sequencing, which transformed biomedical research decades ago. In recent years, the introduction of new sequencing technologies (next-generation sequencing) has driven a wave of breakthroughs. During 2020 and 2021, for instance, these methods enabled the rapid decoding and worldwide monitoring of the SARS-CoV-2 genome.

At the same time, an increasing number of researchers are making their sequencing results publicly accessible. This has led to an explosion of data, stored in major databases such as the American SRA (Sequence Read Archive) and the European ENA (European Nucleotide Archive). Together, these archives now hold about 100 petabytes of information -- roughly equivalent to the total amount of text found across the entire internet, with a single petabyte equaling one million gigabytes.

Until now, biomedical scientists needed enormous computing resources to search through these vast genetic repositories and compare them with their own data, making comprehensive searches nearly impossible. Researchers at ETH Zurich have now developed a way to overcome that limitation.

Full-text search instead of downloading entire data sets

The team created a tool called MetaGraph, which dramatically streamlines and accelerates the process. Instead of downloading entire datasets, MetaGraph enables direct searches within the raw DNA or RNA data -- much like using an internet search engine. Scientists simply enter a genetic sequence of interest into a search field and, within seconds or minutes depending on the query, can see where that sequence appears in global databases.

"It's a kind of Google for DNA," explains Professor Gunnar Rätsch, a data scientist in ETH Zurich's Department of Computer Science. Previously, researchers could only search for descriptive metadata and then had to download the full datasets to access raw sequences. That approach was slow, incomplete, and expensive.

According to the study authors, MetaGraph is also remarkably cost-efficient. Representing all publicly available biological sequences would require only a few computer hard drives, and large queries would cost no more than about 0.74 dollars per megabase.

Because the new DNA search engine is both fast and accurate, it could significantly accelerate research -- particularly in identifying emerging pathogens or analyzing genetic factors linked to antibiotic resistance. The system may even help locate beneficial viruses that destroy harmful bacteria (bacteriophages) hidden within these massive databases.

Compression by a factor of 300

In their study published on October 8 in Nature, the ETH team demonstrated how MetaGraph works. The tool organizes and compresses genetic data using advanced mathematical graphs that structure information more efficiently, similar to how spreadsheet software arranges values. "Mathematically speaking, it is a huge matrix with millions of columns and trillions of rows," Rätsch explains.

Creating indexes to make large datasets searchable is a familiar concept in computer science, but the ETH approach stands out for how it connects raw data with metadata while achieving an extraordinary compression rate of about 300 times. This reduction works much like summarizing a book -- it removes redundancies while preserving the essential narrative and relationships, retaining all relevant information in a much smaller form.

"We are pushing the limits of what is possible in order to keep the data sets as compact as possible without losing necessary information," says Dr. André Kahles, who, like Rätsch, is a member of the Biomedical Informatics Group at ETH Zurich. By contrast with other DNA search masks currently being researched, the ETH researchers' approach is scalable. This means that the larger the amount of data queried, the less additional computing power the tool requires.

Half of the data is already available now

First introduced in 2020, MetaGraph has been steadily refined. The tool is now publicly accessible for searches (https://metagraph.ethz.ch/search) and already indexes millions of DNA, RNA, and protein sequences from viruses, bacteria, fungi, plants, animals, and humans. Currently, nearly half of all available global sequence datasets are included, with the remainder expected to follow by the end of the year. Since MetaGraph is open source, it could also attract interest from pharmaceutical companies managing large volumes of internal research data.

Kahles even believes it is possible that the DNA search engine will one day be used by private individuals: "In the early days, even Google didn't know exactly what a search engine was good for. If the rapid development in DNA sequencing continues, it may become commonplace to identify your balcony plants more precisely."


Story Source:

Materials provided by ETH Zurich. Note: Content may be edited for style and length.


Journal Reference:

  1. Mikhail Karasikov, Harun Mustafa, Daniel Danciu, Oleksandr Kulkov, Marc Zimmermann, Christopher Barber, Gunnar Rätsch, André Kahles. Efficient and accurate search in petabase-scale sequence repositories. Nature, 2025; DOI: 10.1038/s41586-025-09603-w

Cite This Page:

ETH Zurich. "A revolutionary DNA search engine is speeding up genetic discovery." ScienceDaily. ScienceDaily, 28 October 2025. <www.sciencedaily.com/releases/2025/10/251027224917.htm>.
ETH Zurich. (2025, October 28). A revolutionary DNA search engine is speeding up genetic discovery. ScienceDaily. Retrieved October 28, 2025 from www.sciencedaily.com/releases/2025/10/251027224917.htm
ETH Zurich. "A revolutionary DNA search engine is speeding up genetic discovery." ScienceDaily. www.sciencedaily.com/releases/2025/10/251027224917.htm (accessed October 28, 2025).

Explore More

from ScienceDaily

RELATED STORIES