To understand the role that bacterial species play in human biology, scientists usually isolate and culture them in the lab before they sequence their DNA. However, many bacteria thrive in conditions that are not yet reproducible in a laboratory setting.
To obtain information on such species, researchers collect a single sample from the environment – in this case, the human gut – and sequence the DNA from the whole sample. They then use computational methods to reconstruct the individual genomes of thousands of species from that single sample. This method, called metagenomics, offers a powerful alternative to isolating and sequencing the DNA of individual species.
Now an international team, led by European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) and including collaborators from the Wellcome Sanger Institute, the University of Trento, the Gladstone Institutes, and the US Department of Energy Joint Genome Institute, have pooled all known bacterial genomes into one database.
"Last year, three independent teams, including ours, reconstructed thousands of gut microbiome genomes. The big questions were whether these teams had comparable results, and whether we could pool them into a comprehensive inventory," says Rob Finn, Team Leader at EMBL-EBI.
They compiled 200,000 genomes and 170 million protein sequences from more than 4,600 bacterial species in the human gut. Their new databases, the Unified Human Gastrointestinal Genome collection and the Unified Gastrointestinal Protein catalogue, reveal the tremendous diversity in our guts and pave the way for further microbiome research.
"This immense catalogue is a landmark in microbiome research, and will be an invaluable resource for scientists to start studying and hopefully understanding the role of each bacterial species in the human gut ecosystem," explains Nicola Segata, Principal Investigator at the University of Trento.
The project, published in Nature Biotechnology, revealed that more than 70% of the detected bacterial species had never been cultured in the lab and their activity in the body remains unknown. The largest group of bacteria that falls into that category is the Comantemales, an order of gut bacteria first described in 2019 in a study led by the Bork Group at EMBL Heidelberg.
"It was a real surprise to see how widespread the Comantemales are. This highlights how little we know about the bacteria in our gut," explains Alexandre Almeida, EMBL-EBI/Sanger Postdoctoral Fellow in the Finn Team. "We hope our catalogue will help bioinformaticians and microbiologists bridge that knowledge gap in the coming years."
A freely accessible data resource
All the data collected in the Unified Human Gastrointestinal Genome collection and the Unified Human Gastrointestinal Protein catalogue are freely available in MGnify, an EMBL-EBI online resource that allows scientists to analyse their microbial genomic data and make comparisons with existing datasets.
The project already has a number of users in the scientific community. As new datasets emerge from research teams around the world, the catalogue might expand to include the microbiomes of other body parts, like the skin or inside the mouth.
"This catalogue provides a very rich source of information for microbiologists and clinicians. However, we will likely discover many more novel bacterial species in under-represented geographical areas like South America, Asia, and Africa. We still don't know much about the variation in bacterial diversity across different human populations," explains Almeida.
Source: Nature Biotechnology
Almeida. A., et al,
"A unified catalog of 204,938 reference genomes from the human gut microbiome"